David Chudnovsky Gregory Chudnovsky 
Melvyn B. Nathanson Editors 


Number Theory 


New York Seminar 2003 


);) Springer 


Number Theory 
New York Seminar 2003 


Springer Science+Business Media, LLC 


David Chudnovsky 
Gregory Chudnovsky 
Melvyn Nathanson 


Editors 


Number Theory 


New York Seminar 2003 


§)) Springer 


David Chudnovsky Melvyn Nathanson 


Gregory Chudnovsky Department of Mathematics & Computer Science 
Department of Mathematics Lehman College 

Polytechnic University 250 Bedford Park Blvd. West 

6 Metrotech Center Bronx, NY 10468 

Brooklyn, NY 11201 USA 

USA 


Library of Congress Cataloging-in-Publication Data 
New York Number Theory Seminar (2003) 
Number theory: New York seminar 2003 / David Chudnovsky, Gregory Chudnovsky, Melvyn 
Nathanson. 
.  em.— 
ISBN 978-1-4612-6490-3 ISBN 978-1-4419-9060-0 (eBook) 
DOI 10.1007/978-1-4419-9060-0 
1. Number theory—Congresses. I. Chudnovsky, D. (David), 1947— 11. Chudnovsky, G. 
(Gregory), 1952— III. Nathanson, Melvyn B. (Melvyn Bernard), 1944-- IV. Title. 
QA241.N475 2003 
512.7—dc22 2003058458 


ISBN 978-1-4612-6490-3 Printed on acid-free paper. 
© 2004 Springer Science+Business Media New York 


Originally published by Springer-Verlag New York, Inc. in 2004 
Softcover reprint of the hardcover 1st edition 2004 


All rights reserved. This work may not be translated or copied in whole or in part without the written 


permission of the publisher Springer Science+Business Media, LLC, 
except for brief excerpts in connection with reviews or scholarly analysis. Use in 


connection with any form of information storage and retrieval, electronic adaptation, computer 


software, or by similar or dissimilar methodology now known or hereafter developed is forbidden. 


The use in this publication of trade names, trademarks, service marks, and similar terms, even if 
they are not identified as such, is not to be taken as an expression of opinion as to whether or not 


they are subject to proprietary rights. 


987654321 SPIN 10944363 


Springeronline.com 


Preface 


This volume marks the 20th anniversary of the New York Number Theory Sem- 
inar (NYNTS). The seminar began to meet in the Spring, 1982 semester at the 
CUNY Graduate Center in midtown Manhattan, and has been meeting contin- 
uously at the Graduate Center for two decades, even as the Graduate Center 
moved from its original location on 42nd Street near Fifth Avenue to tempo- 
rary quarters in an office building next to Grand Central Station to a new and 
elegant building in the former B. Altman department store on Fifth Avenue 
betwen 34th and 35th Streets. 

The seminar was originally organized by Harvey Cohn, David and Gregory 
Chudnovsky, and Melvyn B. Nathanson. In 1982, Harvey Cohn was at City 
College (CUNY) and the Graduate Center, the Chudnovskys were at Columbia, 
and Mel Nathanson was at Rutgers. Today, Harvey has retired to California, 
the Chudnovskys are at Polytechic University of New York, and Nathanson is 
at Lehman College (CUNY) and the Graduate Center. 

For 20 years the NYNTS has tried to present the broad spectrum of number 
theory and related fields of mathematics, from physics to geometry to computer 
science. Mathematics, like other sciences, is a mafia-run enterprise, where the 
local mafias represent currently fashionable fields of research, almost always 
important, but never nearly as important as the reigning dons believe. We have 
always tried to invite not only Fields metalists and other standard speakers, 
but also mathematicians, especially younger and relatively unknown researchers, 
whose theorems are significant and whose work might become the next big thing 
in number theory. We do not attempt to predict the future, but we consciously 
strive to include speakers who are out of the mainstream. 

Since its inception, the proceedings of the New York Number Theory Semi- 
nar have been published by Springer-Verlag, and we are grateful to Springer and 
its mathematics editors for their support of the Seminar. We thank Ina Linde- 
mann, the current mathematics editor of Springer, for her continuing interest 
in NYNTS. 

At various times in the past 20 years the seminar has been supported by 
grants from the NSA Mathematical Sciences Program, the Number Theory 
Foundation, and the Office of the Provost of Lehman College, and we appreciate 
their support. 


Contents 


Preface 


1. The Spanning Number and the Independence Number of a 
Subset of an Abelian Group 
Béla Bajnok 


2 A Formula Related to the Frobenius Problem in 
Two Dimensions 
Matthias Beck and Sinai Robins 


3. One Bit World 
David V. Chudnovsky and Gregory V. Chudnovsky 


4 Use of Padé Approximation in Spline Construction 
David V. Chudnovsky and Gregory V. Chudnovsky 


5 Interactions between Number Theory and Operator Algebras in 


the Study of the Riemann Zeta Function (d’apres 
Bost—Connes and Connes) 
Paula B. Cohen 


6 A Hyperelliptic Curve with Real Multiplication of Degree Two 


Harvey Cohn 


7 Humbert’s Conic Model and the Kummer Surface 
Harvey Cohn 


8 Arithmeticity and Theta Correspondence of an 
Orthogonal Group 
Ze-Li Dou 


9 Morphic Heights and Periodic Points 
M. Einsiedler, G. Everest, and T. Ward 


Vil 


17 


25 


61 


87 


105 


133 


199 


167 


Vill 


10 


11 


12 


13 


14 


15 


16 


The Elementary Proof of the Prime Number Theorem: 


An Historical Perspective 
D. Goldfeld 


Additive Bases Representations and the 
Erdds—Turan Conjecture 
G. Grekos, L. Haddad, C. Helou, and J. Pikho 


The Boundary Structure of the Sumset in Z? 
Shu-Ping Sandie Han 


On NTUs in Function Fields 
Howard Kleiman 


Continued Fractions and Quadratic Irrationals 
Joseph Lewittes 


The Inverse Problem for Representation Functions of 
Additive Bases 
Melvyn B. Nathanson 


On the Uniquity of Sidon Sets 


Melvyn B. Nathanson 


Contents 


179 


193 


201 


219 


221 


253 


263 


The spanning number and the independence number of a subset of 
an abelian group 


Béla Bajnok 


Department of Mathematics, Gettysburg College 
Gettysburg, PA 17325-1486 USA 
E-mail: bbajnok@gettysburg.edu 


April 29, 2003 


Abstract 


Let A = {a;,@2,...,@m} be a subset of a finite abelian group G. We call A t-independent 
in G, if whenever 
λιαι + AQG2 +°-- + Amam = 0 
for some integers A1,A2,---,Am with 
lAx| + \A2| ΞΕ es [λα] <t, 


we have \; = Ap =-:- = Am = 0, and we say that A is s-spanning in G, if every element g of G 
can be written as 

g = Aya, + λαμ -Ῥ τ" +Amam 
for some integers Aj, 2,---,Am with 


[Ai | + [λ2| +--+ [Am| « 8. 


In this paper we give an upper bound for the size of ἃ t-independent set and a lower bound 
for the size of an s-spanning set in G, and determine some cases when this extremal size occurs. 
We also discuss an interesting connection to spherical combinatorics. 


1 Introduction 


We illuminate our concepts by the following examples. 


Example 1 Consider the set A = {1,4,6,9,11} in the cyclic group G = Z5. We are interested 
in the degree to which this set is independent in G. We find, for example, that 1+ 4 - 4 --ϑ Ξ 0 
and 11+11+9-—6 =O, but that such an equation with only three terms from A cannot be found. 
We therefore say that A is 3-independent in G and write ind(A) = 3. It can be shown that A is 
optimal in each of the following regards: 


D. Chudnovsky et al. (eds.), Number Theory 


J 


© Springer-Verlag New York, Inc. 2004 


e no subset of G of size m > 5 is 3-independent in G (furthermore, A is essentially the unique 
3-independent set in G of size 5); 


e no subset of G of size 5 is ¢t-independent for t > 3 (that is, for t > 3, there will always be t, 
not necessarily distinct, elements with a signed sum of 0); and 


e n = 25 is the smallest odd number for which a 3-independent set of size 5 in Zy, exists. (In fact, 


it can be shown that Z,, has a 3-independent set of size 5, if and only if, n = 20, 22, 24, 25, 26, 
or n > 28.) 


The fact that G = Zs has this relatively large 3-independent subset is due, as explained later, 
to the fact that 25 has a prime divisor which is congruent to 5 mod 6. 


Example 2 How can one place a finite number of points on the d-dimensional sphere 84 Cc R¢+! 
with the highest momentum balance? For the circle S', the answer is given by the vertices of a 
regular polygon, but the issue is far more difficult for d > 1. For a positive integer n and a set of 
integers A = {a),a2,...,@m}, define the set of n points X(A) = {21,22,...,2n} with 


1 2πία]!.  ,2πῖ1α, Ζ2πία . «πα 
v= Fe (cos( > ), sin( τ ),..-, cos( 7), sin( )) 


7ὺ 7 


(7 =1,2,...,n); thus, for example, for n = 25 and A = {1,4,6,9, 11}, X(A) is a set of 25 points 
on the unit sphere 55. It can be shown that this X(A) is a spherical 3-design, that is, for every 
polynomial f : 39 -- R of total degree at most 3, the average value of f on 59 equals the arithmetic 
average of f on X(A). We can also verify that X(A) is optimal in that 


e no set of 25 points is a t-design on S° for t > 3; 
e no set of 25 points is a 3-design on 54 for d > 9; 
e n = 25 is the minimum odd size for which a 3-design on S® exists. (It was recently proved 


that an n-point 3-design on 39 exists, if and only if, n = 20, 22,24, orn > 25.) 


Example 3 Finally, consider A = {3,4} in G = Zos. Note that every element of G can be 
generated by a signed sum of at most three terms of A: 1 = 4—3,2 = 3+3-—4,...,24=3-—4. We 
therefore call A = {3,4} a 3-spanning set in G = Zo5, and write span(A) = 3. Again, our example 
is extremal; it can be shown that 


e no subset of G of size m < 2 is 3-spanning in G; 
e no subset of G of size 2 is s-spanning for s < 3; and 


e n = 25 is the largest number for which a 3-spanning set of size 2 in Z, exists. (Furthermore, 
as we will see, Z, has a 3-spanning set of size at most 2 for every n < 25.) 


In fact, this example has an even more distinguished property: every element of G can be 
written uniquely as a signed sum of at most 3 elements of A; we call such a set perfect. As a 
consequence of being a perfect 3-spanning set, A is also a maximum size 6-independent set in G. 


The fact that G = Zos has a perfect spanning subset of size 2 is due to the fact that 25 is the sum 
of two consecutive squares, as explained later. 


In the subsequent sections of this paper we define and investigate the afore-mentioned concepts 
and statements. Topics similar to spanning numbers (e.g. h-bases) and independence numbers (e.g. 
sum-free sets, Sidon sets, and B, sequences) have been studied vigorously for a long time, see, for 
example, [9], [13], [15], [20], [22], [23], [27], and various sections of Guy’s book [14]. For general 
references on spherical designs, see [4], [8], [10], [11], [12], [17], [21], and [25]. 


2 Spanning numbers 


Let G be a finite abelian group of order |G| = n, written in additive notation. We are interested 
in the degree to which a given subset of G spans G. More precisely, we introduce the following 
definition. 


Definition 1 Let s be a non-negative integer and A = {a1,@2,...,@m}. We say that A is an 
s-spanning set in G, if every g € G can be written as 


g = Aja, + AQqa@2 +--+: +Amam 
for some integers Aj, A2,---,;Am with 
[Ar] + |Ag] Ἔ τ. + |Am| < 8. 


We call the smallest s for which A is s-spanning the spanning number of A in G, and denote τέ 
by span(A). 


Equivalently, A is an s-spanning subset of G if for every element g € G, we can find non-negative 
integers h and k and elements x and y in G, so that zx is the sum of ἢ (not necessarily distinct) 
elements of A, y is the sum of k (not necessarily distinct) elements of A, h+k<s,andg=z-—y. 


The case s = 0 is trivial: the only group G which has a 0-span is the one with a single element; 
therefore, we may assume that s > 1 and n > 2. Obviously, A = G is an s-spanning subset of G for 
every s > 1. Here we are interested in small s-spanning sets in G; we denote the size of a minimum 
s-spanning set of G by p(G, 5). 


For s = 1, it is clear that span(A) = 1, if and only if, for each g € G, A contains at least one 
of g or —g; in particular, A must contain every element of order 2. Let O(G,2) denote the set of 
order 2 elements of G; with this notation we have 


(1) 


_ IG \ O(G, 2) \ {0}| πε |O(G, 2)|—-1 
p(G, 1) = |O(G, 2)| + rr rr, τ΄ 


As a special case, for the cyclic group of order n we have 


P(Zn,1) = |n/2]. (2) 


For s > 2, values of p(G,s) seem difficult to establish, even in the case of the cyclic groups. 


Computational data shows that 


0 ifn=1; 
1 ifn = 2,3, 4,5; 
2 ifn =6,7,...,12,13; 
P(2ns2)= 9 3 ifm =14,15,...,21; 
4 ifn = 22,23,...,33, and n = 35; 
5 ifn = 34, n = 36, 37,...,49, and n= δ]: 
and 
0 ifn=1; 
1 ifn=2,3,...,6, 7; 
P(Zn,3) =< 2 ifn =8,9...,24, 25; 
3 ifn = 26,27,...,50, n = 52, and n = 565; 


aN 


if n = 51,53, 54, n = 56,57,..., 100, and n = 104. 


(Values marked in bold-face will be discussed later.) 


(3) 


(4) 


As these values indicate, p(Zn, 3) is, in general, not a monotone function of n, though we believe 


that 


P(Zn, 5)" 


P(s) := Jim. τ᾿ 


exists for every 8. The following theorem provides a lower bound for p(G, 5) which is of the order 


n/5 as n goes to infinity. 


Theorem 2 Let m and 5 be positive integers, and define a(m, 5) recursively by a(m,0) = a(0,s) = 1 


and 


a(m,s) = a(m -- 1, 5) +a(m,s -- 1) - α(πι -- 1,8 -- 1). 


1. We have 


2. If G has order n and contains an s-spanning set of size m, then n < a(m,s). 


Proof. 1. Let us define 


Clearly, a'(m, 0) = α'(0, 5) = 1; below we prove that a'(m,s) also satisfies the recursion. 


We have 


a'(m—1,8-1) 


sm EC) (3) 


k=0 


simon S()(3)" 


k=0 


Ι 
Ὁ Ὁ Ῥ 
oon 
iv) 
a | 
με 
ςοωςς,Τ . 
aN 
3 
| 
— 
Ne” 
δ 
tow 


and 


vom) = EGE) 
= ΣΟ, 79. τ» 
= ΣΟ ΣΟ γε τ νη δὴν 


Next, we add a/(m -- 1, 5) and a'(m -- 1, 8 -- 1). Note that 
m—1 ge-l 4 πὺ-- 1 9-1 4 m—1 os — (™)os 
5--1 5 -- 8 \s , 
— (8 --Ἰ 1 9--1 1 ἘΞ (s—1\ {π|-- 1 
" " ok ~ " οἱ -- - " k+1 
>i rad | area Lan >) ΟΣ ae Σ area | rs ae 


and by replacing k by k — 1, this sum becomes 


i (a | ae 


and 


Therefore, 
a'(m—1,8s)+a'(m—1,s—1) = Σ (1: ("; 1) >» (; ᾿ i) (7 1)» + (™ )2 
s—l 
- EGG +C> 
-Σ( (ὦ) 
-ΣΩΣ-Σ( ῶΣ 


2. Assume that A = {a,...,@m} is an s-spanning set in ( of size m, and let 
= {λιαι + °°: +Amam | A1,-.-;Am € Ζ,λ +--+ + |Am| < s}. 
We will count the elements in the index set 


T= {(A1,°++,Am) | A1,-++,Am Ε Z, [Ar] +++ + [λα] < 8), 


as follows. For k = 0,1,2,...,m, let I, be the set of those elements of J where exactly k of the 
m coordinates are non-zero. How many elements are in I,? We can choose which k of the m 
coordinates are non-zero in (7) ways; w.l.o.g. let these coordinates be Aj, A2,...,A%. Next, we 
choose the values of |A;|, |A2|,..-,|As|: since the sum of these k positive integers is at most 8, we 


have (5) choices. Finally, each of these codrdinates can be positive or negative, and therefore 


“τ ὦ» 


ΞΘ »-Σ Gram 


Since A is s-spanning in G, we must have n = [Σ] « |I|=a(m,s). QO 


and 


Theorem 2 thus provides a lower bound for the size of an s-spanning set in G which is of the 
order n!/8 as n goes to infinity. 


For exact values, we establish the following results. 


Proposition 3 Let s > 1 be an integer. 


1. If2<¢n<2s+1, then the set {1} is s-generating in Z, and p(Zn,s) = 1. 
2. If 23 +2<n< 2s? +2541, then the set {s,s +1} is s-generating in Zy and p(Zn, 8) = 2. 
3. Ifn > 28% +28 +2, then p(Zn,8) > 3. 


Proof. 1 is trivial. To prove 2, let 
Σ = {Ais + A2(s +1) | Ai, A2 € Z, [Aa] + [λ2] < 5}. 
The elements of ¥ lie in the interval [—(s? + 9), (82 + s)] and, since the index set 
T= {(A1, A2) | Ai, A2 Ε Z, |Aa| + |A2| < 5} 


contains exactly 2s? + 2s + 1 elements, it suffices to prove that no integer in [—(s? + 8), (s? + s)] 
can be written as an element of X in two different ways. Indeed, it is an easy exercise to show that 


Ais + A2o(s +1) =Ajst+AQ(s+1) EX 


implies A; = A and A2 = Aj; therefore, the set {s,s +1} is s-generating in Z,. As the s-span of a 
single element can contain at most 2s+ 1 elements, for values n > 28- 2 we must have p(Zy, 8) = 2. 
Statement 3 follows from Theorem 2 by noting that α(2, 8) =2s*+2s+1. O 


Let us now examine the extremal cases of Theorem 2. 


Definition 4 Suppose that A is an s-spanning set of size m in G and that a(m,s) is defined as in 
Theorem 2. If |G| =n =a(m,s), then we say that A is a perfect s-spanning set in G. 


Cases where Z, has a perfect s-spanning set for 8 = 2 and 8 = 3 are marked with bold-face in 
(3) and (4). Trivially, the empty-set is a perfect s-spanning set in Z; for every s. With (2) and 
Proposition 3, we can exhibit some other perfect spanning sets in the cyclic group. 


Proposition 5 Let m,n, and s be positive integers, and let G = Zn. 


1. Ifn=2%m-+1, then the set {1,2,...,m} is a perfect 1-spanning set in G. 
2. Ifn=2s +1, then the set {1} ts a perfect s-spanning set in G. 


3. Ifn = 2s? +2s+1, then the set {s,s +1} is a perfect s-spanning set in G. 


Note that the sets given in Proposition 5 are not unique: any element of the set in 1 can be 
replaced by its negative; in 2, the set {a} is perfect for every a which is relatively prime to n; it 
is not difficult to show that another example in 3 is provided by A = {1,2s + 1} (however, the set 
{s,s+ 1} in Proposition 3 cannot be replaced by {1,28 + 1}). We could not find perfect spanning 
sets for s > 2 and m > 3. It might be an interesting problem to find and classify all perfect 
spanning sets. 


3 Independence numbers 


As in the previous section, we let G be a finite abelian group of order |G| = n, written in additive 
notation, and suppose that A is a subset of G. Here we are interested in the degree to which A is 
independent in G. More precisely, we introduce the following definition. 


Definition 6 Let t be a non-negative integer and A = {aj,Q2,...,am}. We say that A is a t- 
independent set in G, if whenever 


λιαι + Aga2 +++: +Amam = 0 
for some integers »1,2,--.,;Am with 
JAr] + [Aa] Ἔ τ. + [Am| <4, 


we have \1 = Az = +: =Am =0. We call the largest t for which A is t-independent the indepen- 
dence number of A in G, and denote it by ind(A). 


Equivalently, A is a t-independent set in G, if for all non-negative integers h and k withh+k < f, 
the sum of A (not necessarily distinct) elements of A can only equal the sum of k (not necessarily 
distinct) elements of A in a trivial way, that is, h = k and the two sums contain the same terms in 
some order. 


Here we are interested in the size of a maximum ft-independent set in G; we denote this by 
qQ(G, t). 


Since 0 < ind(A) < n — 1 holds for every subset A of G (so no subset is “completely” indepen- 
dent), we see that q(G,0) = n and q(G,n) = 0. It is also clear that ind(A) = 0, if and only if, 
0 € A, hence 

(6,1) -- τ -- ἴ (5) 


For the rest of this section we assume that ὁ > 2. 


We can easily determine the value of q(G,2) as well. First, note that A cannot contain any 
element of {0} U Ord(G, 2) (the elements of order at most 2); to get a maximum 2-independent set 
in G, take exactly one of each element or its negative in G \ Ord(G, 2) \ {0}, hence we have 


q(G, 2) = 


n — |Ord(G, 2)| - 1 
nn Sn (6) 


As a special case, for the cyclic group of order n we have 
q(Zn,2) = [(π — 1)/2]. (7) 


Note that if Ord(G, 2)U {0} = G then q(G, 2) = 0; for πὶ > 2 this occurs only for the elementary 
abelian 2-group. If Ord(G, 2) U {0} # G then, since Ord(G, 2) U {0} is a subgroup of G, we have 
1 < |Ord(G, 2)| + 1 < n/2, and therefore we get the following. 


Proposition 7 If G is isomorphic to the elementary abelian 2-group, then q(G,2) =0. Otherwise 


1 1 
—n < < =n. 


Let us now consider t = 3. As before, if G does not contain elements of order at least 4, then 
q(G, 3) = 0; this occurs if and only if G is isomorphic to the elementary abelian p-group for p = 2 
or p = 3. In [3] we proved the following. 


Theorem 8 ((3]) Jf G is isomorphic to the elementary abelian p-group for p = 2 or p = 3, then 
q(G,3) =0. Otherwise 


1 1 
—n< < —n. 


These bounds can be attained since g(Z9,3) = 1 and 4(Ζᾳ, 3) = 1. 


For the cyclic group Z,, we can find explicit 3-independent sets as follows. For every τὸ, the 
odd integers which are less than n/3 form a 3-independent set; if n is even, we can go up to (but 
not including) n/2 as then the sum of two odd integers cannot equal n. We can do better in one 
special case when n is odd; namely, when n has a prime divisor p which is congruent to 5 mod 6, 
one can show that the set 


—5 
{pir + 2ie+1] ἐν τορι δ τα, ia = 0,1,..., 25>} (8) 


is 3-independent. It is surprising that these examples cannot be improved, as we have the following 
exact values. 


Theorem 9 ([3]) For the cyclic group G = Z, we have 


[2] if n is even, 
q(Zn,3) = ( 1) & ifn is odd, has prime divisors congruent to 5 (mod 6), 
im —_ . 
᾿ and p is the smallest such divisor, 
[5] otherwise. 


For t > 4, exact results seem more difficult. With the help of a computer, we generated the 
following values. 


0 ifn=1,2,3,4; 
1 ifn=5,6,...,12; 
2 ifn =13,14,...,26; 
UZn14)= 9 3 ifn =27,28,...,45, and n = 47; (9) 
4 ifn = 46, ἡ = 48,49,...,68, and n = 72,73; 
5 ifn =69,70,71, and n = 74,75,..., 102; 
0 ifn =1,2, 3,4, 5; 
q(Zn,5) = 1 ifn=6,7,...,17, and n = 19, 20; 
™ 2 ifn = 18, n= 21, 22,...,37, πὶ = 39, 40, 41, n = 43, 44, 45, 47; 
3 ifn = 38, 42,46, n = 48, 49,...,69, n = 71,72, 73, 75, 76, 77, 79, 81, 83, 85, 87; 
(10) 
and 


ifn = 1, 2,3,...,6; 

if n = 7,8,9,...,24; 

if n = 25, 26, 27,..., 69; (11) 
3 ifn =70,71,...,151, and n = 153, 154, 155, 158, 159, 160. 


Ne © 


q(Zn, 6) = 


(Values marked in bold-face will be discussed later.) 


Again we see that q(Zn, t) is not, in general, a monotone function of n; although for even values 
of t the sequence seems to possess more regularity and we conjecture that 


Z jt t/2 


exists for every even t. The following theorem establishes an upper bound for 9(G, 8) which is of 
the order n?/l*/2! as n goes to infinity. 
Theorem 10 Let m and t be positive integers, t > 2, and let us denote 


_ a(m, t/2) if t is even, 
q(m,t) = { a(m, (ἐ — 1)/2) +a(m—1,(€-1)/2) ift ts odd, 


where a(m,t) is defined in Theorem 2. If G has order n and contains a t-independent set of size 
m, then n > q(m,t). 


10 


Proof. Assume that A = {a1,...,@m} is a t-independent set in G of size m, and define 
DO = (λιαι +--+ +Amam | A1,.--;Am € Z, |Ar| + “τ. + |[Am| < [t/2]} 


and | 
I = {(A1,°++,Am) | λιν. τ. 7) Am © Z,|rAa| +--+ + [Am| < [¢/2]}. 


As in the proof of Theorem 2, we have |I| = a(m, |t/2]). Since A is a t-independent set in G, 
the elements listed in 5 must be all distinct, hence n > |X| = |J| = a(m,|t/2]). If ὁ is even, we are 
done. 


Now let 


Σ΄ = (λιαι Ἔ :" + Amam | A1,---,Am € Z, Ar > 1,A1 + [A2| + τ + [Am] = [172] + 1} 


and | 
r' = {(A1,°*+;Am) | A1,+++)Am Ε ZL, ry = 1. λὶ + |A2| Ἔ Ὁ ἰλια] ΞΞ [12] + 1). 
We will count the elements in the index set |I'| as follows. For k = 0,1,2,...,m-—1, let I, be 
the set of those elements of I’ where exactly k of the m — 1 codrdinates A2,...,Am are non-zero. 


An argument similar to that in the proof of Theorem 2 shows that 
— 1) (l#/2) 
1! -- m k 
πι- (7) (Oe) 2 


= (τς ἡ (021). atm — 11/29. 


k=0 


hence 


If t is odd, then the elements listed in X/’ must be distinct from each other and from those in & as 
well, thus n > |X| + [Σ = [Z| + |J’| = a(m, |t/2]) + a(m -- 1, [t/2]). 0 


Theorem 10 thus provides an upper bound for the size of a t-independent set in G which is of 
the order n}/l4/2! as n goes to infinity. 


For exact values, we establish the following results. 


Proposition 11 Lett > 2 be an integer. 


1. Ifl1<n<t, then q(Zn,t) =0. 
9. Ift+1<n< |t?/2| +t, then the set {1} is t-independent in Zy and q(Zn,t) = 1. 


3. (a) Suppose that t is even. Ifn > t?/2+t+1, then the set {t/2,t/2 +1} 18 t-independent 
in Zn and q(Zn,t) > 2. 
(Ὁ) Suppose that t is odd. If n = (t? —1)/2+4t+1, then the set {1,t} is t-independent in 
Zy, and q(Zn,t) = 2. 


11 


Proof. Let q(m,t) be defined as in Theorem 10. Since q(1,t) τὸ ὁ -Ἐ 1, our first claim follows 
from Theorem 10. To prove 2, note that ifn  ἐ-Ε1, then {1} is t-independent in Z,; furthermore, 
q(2,t) = [2.2 +¢+1. 


Now let t be even, and assume that n > {2.2 - ἐ- 1. We define 
t t | 
Σ τῷ {A15 + λχίς + 1) | Ai, A2 € Z, |Ax| + |A2| < t}. 


The elements of ¥ lie in the interval [—(¢?/2 + t), (t?/2+1)] and therefore, to prove 3 (a), it suffices 
to show that i ἢ 
θ0Ξ λι- Ἐ λοί(Ξ- +1) ER 
λι 5+ a(5 Ἔ1)Ε | 
implies A; = Ag = 0, which is an easy exercise. Statement 3 (b) is essentially similar. O 


We now turn to the extremal cases of Theorem 10. 


Definition 12 Suppose that A is a t-independent set of size m in G and that q(m,t) is defined as 
in Theorem 10. If |G| =n =q(m,t), then we say that A is a tight t-independent set in G. 


Cases where Z,, has a tight t-independent set for t = 4, t = 5, and ¢ = 6 are marked with 
bold-face in (9), (10), and (11). Trivially, the empty-set is a perfect t-independent set in Z, for 
every t. With (5), (7), Theorem 9, Proposition 11, and one other (sporadic) example, we have the 
following tight t-independent sets in the cyclic group. 


Proposition 13 Let τὰ, τι, and t be positive integers, and let G = Zn. 


1. Ifn =2, then the set {1} is a tight 1-independent set in G. 

2. Ifn=2m+1, then the set {1,2,...,m} is a tight 2-independent set in G. 
3. If n = 4m, then the set {1,3,...,2m—1} is a tight 3-independent set in G. 
4. Ifn=t+1, then the set {1} is a tight t-independent set in G. 
5. 


Let n = |t?/2|+t4+1. [ft is even, then the set {t/2,t/2+1} is a tight t-independent set in 
G; if t is odd, then the set {1,t} is a tight t-independent set in G. 


6. If n = 38, then the set {1,7,11} is a tight 5-independent set in G. 


Proposition 13 contains every tight (non-empty) t-independent set that we could find so far; in 
particular, we could not find tight t-independent sets for t > 4 and m > 3 other than the seemingly 
sporadic example listed last. The problem of finding and classifying all tight t-independent sets 
remains open. 


As it is clear from our exposition, there is a strong relationship between s-spanning sets and 
t-independent sets when ¢ is even. Namely, we have the following. 


12 


Theorem 14 Let s and t positive integers, t even. Let A be a subset of G, and suppose that 
span(A) = s and ind(A) =t. 


1. The order n of G satisfies a(m,t/2) <n < a(m,s). 
2. We have t < 2s. 


5. The following three statements are equivalent. 


(i) t= 2s; 
(ti) A is a perfect s-spanning set in G; and 
(iit) A is a tight t-independent set in G. 


The analogous relationship when ἐ is odd is considerably more complicated and will be the 
subject of future study. 


4 Spherical designs 


Here we discuss an application of the previous section to spherical combinatorics. We are interested 
in placing a finite number of points on the d-dimensional sphere S¢ τ 411 with the highest 
momentum balance. The following definition was introduced by Delsarte, Goethals, and Seidel in 
1977 [8]. 


Definition 15 Lett be a non-negative integer. A finite set X of points on the d-sphere S4 Cc [561] 
is a spherical ¢t-design, if for every polynomial f of total degree t or less, the average value of f 
over the whole sphere is equal to the arithmetic average of its values on X. 


In other words, X is a spherical t-design if the Chebyshev-type quadrature formula 


1 1 
aan |, fedous) © Be Do F00) (12) 


χΧΕΧ 


is exact for all polynomials f : 54 -- R of total degree at most t (σα denotes the surface measure 
on 54). 


The concept of t-designs on the sphere is analogous to t — (v, k, Δ) designs in combinatorics (see 
[24]), and has been studied in various contexts, including representation theory, spherical geometry, 
and approximation theory. For general references see [4], [8], [10], [11], [12], [17], [21], and [25]. 
The existence of spherical designs for every t and d and large enough n = |X| was first proved by 
Seymour and Zaslavsky in 1984 [26]. 


A central question in the field is to find all integer triples (¢,d,n) for which a spherical t-design 
on S? exists consisting of n points, and to provide explicit constructions for these parameters. 


13 


Clearly, to achieve high momentum balance on the sphere, one needs to take a large number of 
points. Delsarte, Goethals, and Seidel [8] provide the tight lower bound 


_ (d+ |t/2] d+ |(t—1)/2| 
n> Nim ( [62] )+( \(t—1)/2 ) (18) 


We shall refer to the bound N¢ in (13) as the DGS bound. Spherical designs of this minimum size 
are called tight. Bannai and Damerell [5], [6] proved that tight spherical designs for d > 2 exist 
only for t = 1,2,3,4,5, 7, or 11. All tight t-designs are known, except possibly for t = 4,5, or 7. In 
particular, there is a unique 11—design (d=23 and n = 196560). 


Let us now attempt to construct spherical designs. One’s intuition that the vertices of a regular 
polygon provide spherical designs on the circle S! is indeed correct; more precisely, we have the 
following. 


Proposition 16 Lett and n be positive integers. 


1. Ifn <t, then there is no n-point spherical t-design on 51. 


2. Suppose thatn >t+1. For a positive integer 7, define 
2πῆ. . ,27 
7 .-- an ony). 4 
Zn, (cos n ), sin( n ) (14) 
Then the set Xn := {24|j =1,2,... in} is a t-design on 5}. 


Proof. 1 follows from the DGS bound as N} = ¢+1. To prove 2, we first note that, using 
spherical harmonics, one can prove (see [8]) that, in general, a finite set X is a spherical t-design, 
if and only if, for every integer 1 < k < t and every homogeneous harmonic polynomial f of total 


degree k, 

Σ. f(x) Ξ 0. 

ΧΕΧ 
(A polynomial is harmonic if it is in the kernel of the Laplace operator.) The set of homogeneous 
harmonic polynomials of total degree k on the circle, Harm,(S"'), is a 2-dimensional vector space 
over the reals and is spanned by the polynomials Re(z*) and Im(z*) where z = z + ./—ly (we can 
think of the elements of X and S' as complex numbers). Therefore, we see that X is a t-design on 


51. if and only if, 
Σ τὴ =0 


ZEX 
for k = 1,2,...,t. With X,, as defined above, one finds that 
no 0 ifk #0 modn, 
Σ᾽ (zh) = 
j=1 n ifk =O mod vn. 


Therefore, X, is a t-design on S!, if and only if, k # 0 mod n for k = 1,2,...,t¢ (using the 
terminology of our last section, if and only if, {1} is a t-independent set in Z,), orn >t+1. O 


14 


A further classification of t-designs on the circle can be found in Hong’s paper [18]; he proved, 
for example, that if n > 2ὲ + 3, then there are infinitely many ¢-designs on 51 which do not come 
from regular polygons. | 


We now attempt to generalize Proposition 16 for higher dimensions. For simplicity, we assume 
that d is odd, and let m = (d+ 1)/2. (The case when d is even can be reduced to this case by a 
simple technique, see [2] or [19].) | 


Let a1,@2,-..,@m be integers, and set A:= {a1,a@2,...,@m}. For a positive integers n, define 
log, \j , . 
Xpn(A) := iz (z2,(a1),24,(a2),... ,Z)(dm)) | j =1,2,... in} 7 (15) 


where, like in (14), 


27 . 207 
τ αὐ), οἰη( 3.0] 


47 (aj) :ΞΞ (cos 
Note that X,(A) C 54, In [2] we proved the following. 


Theorem 17 ([2]) Let t, d, and n be positive.integers with t < 3, d odd, and set m = (d +1)/2. 
For integers a1, @2,...,@m, define X,(A) as in (15). If A is a t-independent set in Zn, then Xy(A) 
is a spherical t-desian on 55. 


Theorem 17 yields the following results. 
Corollary 18 Let n and d be positive integers, d odd, and set m = (d+ 1)/2. 


1. (a) If n =1, then there is no n-point spherical 1-destgn on σά 
(Ὁ) If n > 2, define a; = 1 for! <i<m. Then the set Χη(Α), as defined in (15), is a 
spherical 1-design on S4. 
2. (a) Ifn<d+1, then there ts no n-point spherical 2-design on 54. 
(Ὁ) Ifn >d+2, define a; =i for 1 <i<m. Then the set X,(A), as defined in (15), is a 
spherical 2-design on 54. 
3. (a) Ifn <2d+1, then there is no n-point spherical 3-design on σὰ, 
(Ὁ) If n > 24 - 2 is even or ifn > 3d+3 is odd, define a; = 2ὲ -Ε1 for1 - ὁ “ πι; if 


p 
n > ——(3d+ 3 
~ pt τί ) 
where p is a divisor of n which is congruent to 5 mod 6, choose A to be any m elements 


of the set in (8). In each case the set X,(A), as defined in (15), ts a spherical 3-design 
on S4, 


15 


Proof. Parts (a) are from the DGS bounds N# for ἐ < 3; parts (b) follow from Theorem 17 
since, by (5), (7), and the paragraph before Theorem 9, the sets specified are t-independent for 
t = 1,2, and 3, respectively (note that in all cases of 2 and 3, m=(d+1)/2<q(Z,,t)). OQ 


Part 3 of Corollary 18 leaves the question of existence of 3-designs open for some odd values of 


n. Note that the minimum value of 


is 5(d + 1)/2 (when n is divisible by 5). In [2] we proved that a spherical 3-design on 5 (d odd) 
exists for every odd value of n > 5(d + 1)/2, and conjectured that 3-designs do not exist with 
2(d+1) <n « 5(d+1)/2 and n odd. This conjecture is supported by the numerical evidence of 
Hardin and Sloane [16]. A recent result of Boumova, Boyvalenkov, and Danev [7] proves that no 
3-design exists of odd size n with n <= 2.32(d+ 1). In particular, the case d = 9 of Example 2 
in our Introduction is completely settled: 3-designs on n points on 59 exist, if and only if, n > 20 
even, or n > 25 odd. 


The application of t-independent sets to spherical t-designs seems more complicated when t > 4, 
and will be the subject of an upcoming paper. 


Acknowledgments. The author expresses his gratitude to his students Nicolae Laza for 
valuable computations and Nikolay Doskov for an improvement of Proposition 3. 


References 


(1] B. Bajnok. Construction of spherical t-designs. Geom. Dedicata, 43:167-179, 1992. 
[2] B. Bajnok. Constructions of spherical 3-designs. Graphs Combin., 14/2:97—-107, 1998. 


[3] B. Bajnok and I. Ruzsa. The independence number of a subset of an abelian group. Integers, 
3/Paper A2, 23 pp. (electronic), 2003. 


[4] E. Bannai. On extremal finite sets in the sphere and other metric spaces. London Math. Soc. 
Lecture Note Ser., 131:13—38, 1988. 


[5] E. Bannai and R. M. Damerell. Tight spherical designs I. J. Math. Soc. Japan, 31:199-207, 
1979. 


[6] E. Bannai and R. M. Damerell. Tight spherical designs II. J. London Math. Soc. (2), 21:13-30, 
1980. 


[7] S. Boumova, P. Boyvalenkov, and Ὁ. Danev. New nonexistence results for spherical designs. 
In B. Bojanov, editor, Constructive Theory of Functions, pages 225-232, Varna, 2002. 


(8] P. Delsarte, J. M. Goethals, and J. J. Seidel. Spherical codes and designs. Geom. Dedicata, 
6:363-388, 1977. 


[9] P. Erdés and R. Freud. A Sidon problémakor. Mat. Lapok, 1991/2:1-44, 1991. 
[10] C. Ὁ. Godsil. Algebraic Combinatorics. Chapman and Hall, Inc., 1993. 


16 


[11] J. M. Goethals and J. J. Seidel. Spherical designs. In D. K. Ray-Chaudhuri, editor, Relations 
between combinatorics and other parts of mathematics, volume 34 of Proc. Sympos. Pure Math., 
pages 255-272. American Mathematical Society, 1979. 


[12] J. M. Goethals and J. J. Seidel. Cubature formulae, polytopes and spherical designs. In 
C. Davis, B. Griinbaum, and F. A. Sher, editors, The Geometric Vein: The Coveter Festschrift, 
pages 203-218. Springer-Verlag New York, Inc., 1981. 


[18] 5. W. Graham. Βμ sequences. In B. C. Berndt, H. G. Diamond, and A. J. Hildebrand, editors, 
Analytic number theory, Vol.1. (Allerton Park, IL, 1995), pages 431-449, Progr. Math. 138, 
Birkhauser Boston, Boston, MA, 1996. 


[14] R. K. Guy. Unsolved Problems in Number Theory. Second edition. Springer-Verlag New York, 
1994. 


[15] H. Halberstam and K. F. Roth. Sequences. Second edition. Springer-Verlag New York — Berlin, 
1983. 


[16] R. H. Hardin and N. J. A. Sloane. McLaren’s improved snub cube and other new spherical 
designs in three dimensions. Discrete Comput. Geom., 15:429-441, 1996. 


[17] S. G. Hoggar. Spherical ¢-designs. In C. J. Colbourn and J. H. Dinitz, editors, The CRC 
handbook of combinatorial designs, pages 462-466. CRC Press, Inc., 1996. 


[18] Y. Hong. On Spherical t-designs in ΕΖ. Europ. J. Combinatorics, 3:255-258, 1982. 
[19] J. Mimura. A construction of spherical 2-design. Graphs Combin., 6:369-372, 1990. 


[20] M. B. Nathanson. Additive Number Theory: Inverse Problems and the Geometry of Sumsets. 
Springer-Verlag New York, 1996. 


[21] B. Reznick. Sums of even powers of real linear forms. Mem. Amer. Math. Soc., 463, 1992. 
[22] I. Ruzsa. Solving linear equations in a set of integers I. Acta Arith., 65/3:259-282, 1993. 
[23] I. Ruzsa. Solving linear equations in a set of integers II. Acta Arith., 72/4:385-397, 1995. 
[24] J. J. Seidel. Designs and approximation. Contemp. Math., 111:179-186, 1990. 


[25] J. J. Seidel. Spherical designs and tensors. In E. Bannai and A. Munemasa, editors, Progress 
in algebraic combinatorics, volume 24 of Adv. Stud. Pure Math., pages 309-321. Mathematical 
Society of Japan, 1996. 


[26] P. ω. Seymour and T. Zaslavsky. Averaging sets: A generalization of mean values and spherical 
designs. Adv. Math., 52:213-240, 1984. 


[27] W. Ὁ. Wallis, A. P. Street, and J. 5. Wallis. Combinatorics: Room Squares, Sum-free Sets, 
Hadamard Matrices, Lecture Notes in Mathematics, Vol. 292, Part 3. Springer-Verlag, Berlin- 
New York, 1972. 


A formula related to the Frobenius problem in two 
dimensions ! | 


MATTHIAS BECK AND SINAI ROBINS 2 


1 Introduction 


Given positive integers a1,...,@, with gcd(a1,...,@n,) = 1, we call an integer t 
representable if there exist nonnegative integers m,...,™m, such that 


nr 
t= > Mia; . 
j=l 
In this paper, we discuss the linear diophantine problem of Frobenius: namely, 
find the largest integer which is not representable. We call this largest integer the 
Frobenius number g(a;,...,a,). We study a more general problem: namely, 
we consider N(t), the number of nonnegative integer solutions (m1,...,™m,) to 
5, ὦ m,a,; = t for any positive integral ¢. In this paper, we concentrate on two 
dimensions and obtain the formula 


lari ira τὺ 


((x)) =z — [x] -- 1,2 
is called a sawtooth function, αἴ αι = 1 (mod ag), and ας ἶα: = 1 (mod a}). 
From this apparently new identity, we immediately recover and extend two well- 
known results: g(a1,@2) = a1a2 — a; — a2 ([Sy]), and exactly half of the integers 
between 1 and (a; — 1)(a2 — 1) are representable ([Ni-Wi]). Note that we can 
rephrase the definition of g(ai, a2) to be the largest zero of N(t). 


Here 


More generally, we say that an integer t is k-representable if N(t) = k; that 
is, ¢ can be represented in exactly k ways. Define σὰ = g,(a1,...,@n) to be 
the largest k-representable integer. It is fairly easy to see that for each k, 
eventually all integers are k-representable. Hence g, is well-defined, and ev- 
ery integer greater than g, is representable in at least k + 1 ways. In par- 
ticular go(@1,...,@n) = g(ai1,...,@n). We prove statements about σε simi- 
lar in spirit to the two classical results mentioned above. Finally, we define 
δε ={tEN: N(t) >k}. It is easy to see that δὰ is a semigroup for all posi- 
tive integers k. Note that So D S; D 85 5... and hence provides an interesting 
filtration of the positive integers. This filtration forms an arithmetic progression 
in two dimensions but becomes nonlinear in dimension greater than two. 


Preprint, December 13, 1999. Keywords: the linear diophantine problem of Frobenius, 
rational polytopes, lattice points. 

2Dept. of Mathematics, Temple University, Philadelphia, PA 19122 
matthiasQmath.temple.edu, srobins@math.temple.edu 


D. Chudnovsky et al. (eds.), Number Theory 


ΓΛ Cees. EE * 7, 7 | ce KT J Ἢ 7 - .|, rae 2 (Ὶ ΓΛ 
© Springer-Verlag New York, Inc. 2004 


18 
2 Proof of the formula 


The Frobenius setting can be translated into a lattice point problem: we look 
at integer dilates of the rational polytope 


n 
P= ¢(21,.-.-,;2n) € R”: 2;>0, > α;α; Ξ 1 
j=1 


and ask for the largest such dilate which contains no lattice point. In the two- 
dimensional case, we want to study 


N(t)=#{(m,n) € Z?: m,n>0, am+bn=t} , 
which is the lattice point count on the hypothenuse of the triangle t7, where 
T={(z,y) ER: 2,y>0, ax+by <1}. 


We compute the quantity N(t) by partial fractions, inspired by their applica- 
tions to Dedekind sums in [Ge]. We note that one does not have to think of 
N(t) as the lattice point count of a polytope to understand the proof of the 
following theorem; however, this geometric interpretation was the motivation 
for our proof. 


Theorem 2.1 


Ni) == - (()) ἢ ((5*)) 


Proof. N(t) is the constant coefficient of the generating function 
1 
Fa= | dom) | lai | at = ee 
a0 0 (1 — z*)(1—2°)z 
We will expand this function into partial fractions: 


Ay B) σι C2 
ΟΞ Det aati tee tLe 


At=1 Ab=1 k=1 


(Alternatively, we could shift the constant coefficient to the residue and use the 
residue theorem.) Here, ‘ means we omit the trivial root of unity 4 = 1. Note 
that the nontrivial roots of unity yield simple poles, since a and ὃ are relatively 
prime. The reason for doing this is the following: the constant coefficient of the 
right-hand side, and thus N;(p, 4), equals 


19 


στ τὺ. σι .ὦ. (1) 


The computation of the coefficients A, for a nontrivial a’th root of unity Δ is 
straightforward: 


Ay = lim ——27 4 = Ὁ 
rs (l—z2)(1—z5)zt αδΔἀ( -- λ) AE“ " 


Similarly, we obtain for a nontrivial b’th root of unity 


1 
δᾳ — λα) AEH d 


The coefficients C2 and C, are simply the two leading coefficients of the Laurent 
series of f. Using the expansion 
1 1 


— +7, _4y-1 ,  - 
1-- χα αὐ 1 Ὁ 2a 


By = 


1+ O(e-1), 


it is easy to see that 
1 1 1 1 t 


~ ab 


Hence we can rewrite (1) as 


C2 


τ. i _ root ,1.,.1.,0. ἢ 
N@= cata td ἡππλ)λ τ 36 ὁ Ταῦ (2 
At%=1 Ao=1 
We claim that 1 1 , 1 
-- ! — ἈἝςπ---- -- -- — 
α Σ, (1—A)At ((<)) 2a ’ (3) 


At¢=1 
from which the statement of the theorem follows from (2) by 


1 >> 1 1 1 »" ͵ 1 ο (3) ((=*)) 1 

a ΠΠ-. λδλλ a ™M—-)d et  —\log J) 2° 

a λαξξὶ (1 A )λ a At¢=1 (1 A)A a 2a 
It remains to prove (3), which is equivalent to the well-known finite Fourier 
series for the sawtooth function (see, e.g., [Ra-Gr]). For sake of completeness 
we give a short proof of (3), again based on lattice point considerations: consider 


the interval ' 
71 :ΞΞ [Ξ ={xcER: δῦ, axr<t}, 


viewed as a one-dimensional polytope. The lattice point count in Z is clearly 


a 


#(ZNZ) = fe) ταὶ (4) 


20 


On the other hand, we can write this number, by applying the above ideas to 
the interval Z, as the constant coefficient of 


ΒΝ 
(1 -- χα) (1 -- “) χέ ᾿ 


If we expand this function into partial fractions in the exact same manner as 
above and compare constant coefficients, which equal the lattice point count, 
we obtain 


δ Lo 1 ie, 1 
HIND) =F +ot ate Σ, Tow (5) 


Comparing (4) with (5) yields (3). oO 


3 Consequences of the formula for N(t) 


From Theorem 2.1, we can derive two basic results on the Frobenius problem: 
First, Sylvester’s result ([Sy]) is a straightforward consequence: 


Corollary 3.1 g(a,b) = αὖ -- α -- ὃ. 


Proof. We have to show that N(ab -- a -- b) = 0 and that N(t) > O for every 
t > ab—a-— b. First, by the periodicity of the sawtooth function, 


N(ab—a—b) = P= 5 "- (2-9) : ((Ξ.ΞΞ33) 
πε (09). 
ἀρε ()). 


For any integer m, ((35)) « $ — 4. Hence for any positive integer n, 


ab 2 a 2 ὃ 


21 


Corollary 3.2 Exactly half of the integers between 1 and (a — 1)(b — 1) are 
representable. 


Proof. We first claim that, if t € [1, ab — 1] is not a multiple of a or ὃ, 


N(t)+N(ab—t) =1. (6) 


This identity follows directly from Theorem 2.1: 


νῶτον ὅσες ((ῷτ9}). ((:59}} 
αὐτὸ τ((52)-(( 5) 
τινά ((29)Ῥ»((55) 


--1--Ν(. 


Here, (x) follows from the fact that ((—x)) = --((“)}) if z ¢ Z. This shows that, 
for t between 1 and ab—1 and not divisible by a or ὃ, exactly one of t and ab—t 
is not representable. There are 


ab—a—b+1=(a—-1)(b—1)=Q(a,b)+1 


integers between 1 and αὖ -- 1 which are not divisible by a or ὃ. Finally, we note 
that N(t) > 0 if ¢ is a multiple of a or ὃ, by the very definition of N(t). Hence 
the number of non-representable integers is $(a — 1)(b— 1). 5 


Note that we proved even more. By (6), every positive integer less than ab has at 
most one representation. Hence, the representable integers in the above corollary 
are uniquely representable. We now study integers that are k-representable. 


Corollary 3.3 N(t+ ab) = N(t) +1. 


Proof. By the periodicity of the sawtooth function, 


N(t + ab) = ae - ((:-ξ: 9} " ((Ξ-:Ξ 2?) ) 
- τ +1- ((*)) - ((:)} = N(t)+1. 


22 
Corollary 3.4 σκί(α, ὃ) = (k+1l)ab—a-—b. 


Proof. By the preceeding corollary, 9.41 = 9, + ab. The statement follows now 
inductively. oO 


Corollary 3.5 Given k > 2, the smallest k-representable integer is ab(k — 1). 


Proof. Let n be a nonnegative integer. Then 


N(ab(k - 1) —n) = 


_ ablk = =n (Howe 1) =) Ε ((Ξ-Ε 1) -))) 
n-1-§-((22))-($2)) ; 


If n = 0, (7) equals k. If πὶ is positive, we use ((5)) > --ἰ to see that 


N(ab(k -- 1) - πὴ <k-— <k. 


5 


All nonrepresentable positive integers lie, by definition, in the interval [1, g(a, b)]. 
It is easy to see that the smallest interval containing all uniquely representable 
integers is [min(a,b),g,]. For k > 2, the corresponding interval always has 
length 2ab — a — b+ 1, and the precise interval is given next. 


Corollary 3.6 Given k > 2, the smallest interval containing all k-representable 

integers is [0κ-. +a+t, gx]. 

Proof. By Corollaries 3.4 and 3.5, the smallest integer in the interval is 
ab(k—1)=gx-2 +at+b. 

The upper bound of the interval follows by definition of g,. 0 


Corollary 3.7 There are exactly ab — 1 integers which are uniquely repre- 
sentable. Given k > 2, there are exactly ab k-representable integers. 


23 


Proof. First, in the interval [1, ab], there are, by Corollaries 3.2 and 3.5, 


ἀ-ηῤ-) 


αὖ 5 


l-representable integers. Using Corollory 3.3, we see that there are 


(a — 1)(ὁ -- 1) 
2 


1-representable integers above ab. For k > 2, the statement follows by similar 
reasoning. O 


4 Final remarks 


This paper gives only a glimpse of how the ’polytope-view’ of the Frobenius 
problem can be used to gain new results. In dimensions higher than 2, gener- 
alized Dedekind sums ((Ge]) appear in the formulas for N(t), which increases 
the complexity of the problem. The details will be described in a forthcoming 
paper ({Be-Di-Ro]). 


We conclude with a few remarks regarding extensions of the above corollaries 
to higher dimensions. Although no ’nice’ formula similar to Sylvester’s result 
(Corollary 3.1) is known in dimensions greater than 2, there has been a huge 
effort devoted to giving bounds and algorithms for the Frobenius number. Sec- 
ondly, we remark that Corollary 3.2 does not extend in general; however, {Ni- Wil 
gives necessary and sufficient conditions on the a,’s under which Corollary 3.2 
does extend. Corollary 3.3 extends easily to higher dimensions ([Be-Di-Ro)). 
We leave the reader with the following 


Problem Extend Corollaries 3.1, 3.4, 8.5, 3.6, and 3.7 to higher dimensions. 


References 


[Be-Di-Ro] M. Beck, R. D1az, 5. RoBINs, The Frobenius problem, rational polytopes, and 
Fourier-Dedekind sums, in preparation. 


[Ge] I. GEssEL, Generating functions and generalized Dedekind sums, Electronic J. Comb. 4 
(no. 2) (1997). 

[Ni-Wi] A. NUJENHUIS, H. 5. WiLF, Representations of integers by linear forms in nonnegative 
integers, J. Number Theory 4 (1972), 98-106. 

[Ra-Gr] H. RADEMACHER, E. GROSSWALD, Dedekind sums, Carus Mathematical Monographs, 
The Mathematical Association of America (1972). 


[Sy] J. J. SYLVESTER, Mathematical questions with their solutions, Educational Times 41 
(1884), 171-178. 


One Bit World 


David V. Chudnovsky! and Gregory V. Chudnovsky? 


' IMAS, Polytechnic University, Brooklyn, 6 MetroTech Center, NY 11201 
david@imas.poly.edu 

2 IMAS, Polytechnic University, Brooklyn, 6 MetroTech Center, NY 11201 
gregory@imas.poly.edu 


1 Introduction. 


We want to acknowledge Michael Gerzon of Oxford who had been an early 
pioneer of one bit audio. 

In the beginning there was a word. Analog audio was the first signal seri- 
ously studied and recorded. The first efforts in quantization of analog signals 
were directed towards audio and are known as PCM (Pulse Code Modulation). 

PCM is a technique patented in 1938 by Reeves. It still continues to be a 
major underpinning of all digital speech, audio and video after over 50 years 
of the first hardware implementation in 1947 (H.S. Black, Bell Laboratories). 
Shannon and others analyzed PCM in detail in 1948. It, essentially, consists 
of three steps of processing of analog signals: 

sampling, 

quantizing and 

binary encoding. 

The sampler converts a continuous time analog waveform x(t) into a dis- 
crete sequence of samples z, = x(n/fs) with a sample frequency ἔς, usually 
after the analog waveform is processed by a lowpass filter with the cutoff fre- 
quency f,/2 - Nyquist limit. The samples x, are quantized and represented 
in a binary form as binary integers of a given dynamic range, which defines 
the number of quantization levels. 

These days the mathematically sophisticated PCM machinery is pushed 
under the rug and given a general name of A/D - analog to digital conversion. 
The dynamic range of A/D converters built by many manufacturers can be 
impressive, though most of the world is so far relatively happy with 16-bit 
(or even less than 16-bit) A/D convertors. This essentially means that analog 
waveforms are only sampled within 16 bit accuracy at uniform time intervals. 

The vastness of digital information encoded with the PCM quantization 
(most often without any compression whatsoever) is staggering. We think that 
the largest commercial repository of digital information is in the form of the 
high fidelity audio files. These days anybody can get them. Just download 


25 


δ] ( ( |, NI7CO 7 at 4] ( a 4 ς λ Λ Ἶ͵ amhor ‘ ΝΑ Vat yan 
D. Chudnovsky et al. (eds.), Number Theory 
τ Se aE ΤΩ Ay Cee 72. ; -. WINN, 
© Springer-Verlag New York, Inc. 2004 
OU OU - 


26 


your favorite ripper program from the Web, pop in your favorite music CD 
and in anywhere between a few minutes and a few hours (depending on how 
much you paid for that CD drive) you get over 500 Megabyte of PCM data - 
a stream of 16-bit integers adequately representing music (or voice) sampled 
at 44.1 KHz. 

On the top of PCM files - which are simple and uncompressed - one has a 
multitude of compression (or compression with encryption) formats. There is 
even a larger set of formats dealing with compression of PCM- information of 
video files. The video files can be thought of as two- dimensional PCM files; 
often represented as a linear PCM file, following the line scan of the video. 

Almost. all stored digital non-trivial information is in the form of audio 
or video content. The music information is slightly redundant, but so far no 
lossless compression on music has achieved even 2 : 1 compression rate. 

The PCM represents the first step into the One Bit World 

Technically the PCM digitalization of the information can be considered 
as its conversion into a ”one bit” stream if one looks at it’s radix 2 positional 
number representation. It is not what we have in mind, when we speak about 
one bit world. Here we literally mean sequences of 0 and 1 (in fact, we prefer 
to call them +1 and —1) representing individual bit-words. 

The actual reason for the need of one-bit data stream, accurately repre- 
senting an analog continuous signal, has to do with the hardware. The A to 
D conversion of high accuracy is a grand idea, but, in reality, the nonlinear 
effects and manufacturing difficulties make it very hard to achieve A to D con- 
verters of high dynamic range. On the other hand, a one-bit A/D convertor is 
a linear device (in a mathematical sense), and represents, basically, an easily 
realized comparator. Moreover, once a one bit A to D convertor is built, it is a 
matter of mathematics and not of circuit practice to turn, in digital domain, a 
one-bit stream, albeit of a much higher frequency, into a high dynamic range 
digital sample stream. 

This is the principle behind what is known as Delta-Sigma (A — 5’): differ- 
entiate the bandwidth-limited analog signal (Delta) into one-bit signal (and 
the remaining noise) stream; and then integrate, or, rather, filter (Sigma) the 
one-bit signal stream into a required digitalization of a high dynamic range. 


1.1 The A - Σ᾽ Solution. 


The first account of the Delta-Sigma belongs to C.C. Cutler in his U.S. patent 
of March 1960, [CU]. H. Inose and Y. Yoshida [IY] introduced the term ” Delta 
- Sigma” and did the first analysis. In this country J.C. Candy in 1974 [CA74] 
introduced the name Sigma - Delta and popularized its use for A/D and D/A 
converters. 

Starting in 1987 in a series of papers by R.M. Gray and his coworkers 
the rigorous mathematical treatment of the Delta-Sigma appeared, [CA97], 
[GR87], [GR97], [GRCW], [GRND], [NST]. Not surprisingly, this is a number- 
theoretical treatment, relaying heavily on the theory of uniform distribution. 


27 


We give just the barest example of a relationship with diophantine approx- 
imations, following Gray [GR89]. We do not have to dwell on the modern 
development of uniform distribution as applied to the Delta-Sigma for differ- 
ent classes of functions. We do want to refer to interesting papers of Gunturk, 
some of which were presented at this seminar; cf. [GC]. 

Delta-Sigma modulation is defined by the following (nonlinear) difference 
equations in the time domain: 


u* ifn =0 
1, = 
” Ln—-1 — €n—1 forn=1,2,... 


Here €m (quantization noise) is defined in term of g(u). g(u) is the output 
from the one bit quantizer, defined in the simplest case as: 


(u) = bifu>O0 
WY) = ) _» otherwise 


and 
En = q(un) " Un, —0 < en <b 


is the binary quantization noise. 

The intuitive idea of oversampled A/D is to shape the binary quantization 
noise with the filter, so that the quantization noise is mostly out of the base- 
band. A low pass digital filter is used as a decoding filter to produce a digital 
representation of the analog input. 

If the input x, € [—b, δ) and the linear decoding filter H has an impulse 
response (hg, h1,..., hn) then the Sigma- Delta A/D converter with decoding 
filter H band sampled at time N will satisfy 


N N N 
H(q(un)) = δ᾽ hn—iq(ui) = »" An—idi-1 + δ hy —il€ — €j-}). 


Consider a good noise shaping filter. E.g., a sinc? filter is formed by cascade 
realization of two sinc (comb) filters. Unlike the sinc filter where each sample 
has equal weight, the weight of a sinc? filter is given by 


h vee ifk =2,3,...,N+1 
Κ΄“ ) 2 ἽΝ κει for k= N-+2,. ὩΝ 


which has a triangular shape. 

The first example of the appearance of diophantine approximation (uni- 
form distribution theory) is the seemingly simple case of the DC signal (con- 
stant value): x, = x. In the context of the simplest quantizer q(x) above, this 
leads to the problem of discrepancy of the sequence of {n-3} for @ = ete With 
the notation of the discrepancy function Dy ({G},...,{NG}), determined by 


28 


the continued fraction expansion of G, we get Gray’s bound on the accuracy 
of DC-level reconstruction: 
The absolute decoding error with sinc” decoding filter satisfies 


J 2 ~ Q(a) |S 5: Dw({B},.-, {NB}). 


Now we can characterize the importance of the Delta-Sigma in the acqui- 
sition of Petabytes of audio data. It is the Delta-Sigma that makes current 
CDs and other HiFi music possible. 

In the current CD mastering process an oversampled one-bit Delta - Sigma 
A/D is used as a front end, but the one-bit output signal must be decimated 
and re-quantized in real time to form the regular sample rate PCM signal. 

Recording a one-bit signal instead of a multi-bit recording format has the 
following advantages: 

1) simple system structure, because of the serial nature without any im- 
plied framing (nothing like the classical endian mode incompatibility); 

2) because the signal is oversampled, the system characteristics approach 
that of high quality analog audio; 

3) the bit stream for a known sampling rate is completely independent of 
the noise shaping filter, thus tapes recorded with different noise shapers can 
be interchanged, or; 

4) recorder performance can be optimized by changing the noise shaping 
filter and not the format; 

5) tape recorders are upward and downward compatible in terms of the 
format and only the quality of replay changes. 

These new formats are the underpinning of new audio files: SACD - super 
audio compact disc and DVD audio. We refer to Gerzon’s last paper for the 
original motivation [CRGE]. 

The Delta-Sigma relaxes demand on the analog circuitry at the expense 
of the increased demand placed on digital circuitry. It is the filtering DSP 
part - commonly known as noise shaping - that becomes paramount. Filters 
of length up to 2K are known to have been built into the Delta-Sigma. The 
quantization part of the Delta-Sigma is a nonlinear iterative mapping, and 
thus has wonderful properties of complex dynamical systems (limit cycles, 
various attractors, ... etc.). Moreover, to get a better noise shape Delta-Sigma 
is often cascaded to provide even more and more sophisticated dynamical 
systems, with even less chances of a complete analysis (cf. [NST]). 

One of the most difficult problems in A — Σ᾽ design is the choice of a high 
frequency - a high multiple of the Nyquist limit - at which to quantize the 
signal. For a sufficiently high multiple no spectral information is lost but at 
the expense of much more storage and bandwidth requirements. 

One of the advantages and simultaneously a problem with Delta-Sigma is 
the local character of its one bit quantization. When one can have a buffered 
stream of analog signals, one can make global decisions on one bit quantization 
in an asynchronous fashion, rather than providing one new bit every clock tick. 


29 


This global optimal one-bit quantization is very appropriate for the Delta- 
Sigma in D/A case (when the stream of data is known a priory). 

There is yet another hardware term describing this approach - Pulse Width 
Modulation (PWM). PWM is one-bit representation of an analog form as a 
train of +1 or —1 square pulses (often, PWM is even mistaken for Delta part 
in Delta-Sigma). 

The Optimal PWM (OPWM) approach provides the optimal number of 
one bit samples needed to reconstruct the original signal stream (in contrast 
with A — 3’). However, in general the PWM and OPWM approaches do not 
have any particular frequency at which the edges of the pulses are sampled - 
in theory the location of the edges is arbitrary. 

The OPWM problem for arbitrary harmonic series is solved in this pa- 
per using a combination of methods of orthogonal polynomials and nonlinear 
completely integrable systems. 


2 Optimal Pulse Width Modulation (OPWM). 


2.1 General Formulation. 


The most general Optimal PWM formulation is the following. Given a spectral 
bandwidth limited (analog) waveform, find a square form of amplitudes +1 
that most closely approximates this waveform at the low frequency range. 
More precisely, given a periodic harmonic series 


N 
w(t) = be + S "(an sinnt + by cos nt) 


n=1 


(with the fundamental domain [0, 27]), find the periodic pulse train f(t) 
with 2N+1 (unknown) edges a; (i = 1,...,2N-+1) in the fundamental domain 
such that the Fourier expansion of f(t) has the leading 2N + 1 coefficients the 
same as w(t). 

The main period part of the pulse train can be expressed as follows 


O=ag < ay <ag<...< Gan < Qon41 < Qen42 = 27. 


There are, of course, power and other isoperimetric constraints on the 
harmonic form y(t) for it to be accurately approximated by a square form 
f(t) (with the leading 2N + 1 harmonic terms). The simplest of them is a 
consequence of the Parceval identity (and expresses a fact that the amplitudes 
of individual waves have to be limited in order to be represented by one bit 
pulses): 


N 
265 + Ν (az, + b2) <2. 


n=] 


30 


In general, we ask for 


t)| < 1. 
pax, Iy(t)| < 


If a good pulse approximation f(t) to w(t) can be found, this f(t) can be 
used as a digital” (or, rather, a one- bit) representation of 7(t). It can be 
quantized in the t - domain by rounding the corresponding values of a; to a 
fixed frequency. It can be further compressed using additional properties of 
the distribution of a;. Jumping a little bit ahead such asymptotic properties 
of a; do exist, because a; are related to the zeroes of orthogonal polynomials. 
This representation is also excellent for storage and transmission, especially 
for audio signals. 

Of course, in the real world, i.e., in the analog world, the square waveform 
does not look at all as a harmonic one. Thus to make a square form look 
(and, most importantly, sound) good, it has to pass through a low pass filter to 
restore its spectral bandwidth limitation. In audio playback this is the famous 
problem of psycho-acoustic spectral matching (according to an experimentally 
observed graph of spectral weights). For instrumentation providing the best 
quality of reproduction, the one bit nature of the encoded waveform is crucial, 
because one bit A/D and D/A converters provide the best signal quality. 


2.2 The Classical Pulse Width Modulation (PWM) Problem. 


The classical PWM - Pulse Width Modulation had been developed in the 
60s and 70s as a one-bit representation of waveforms with a very interesting 
feature: only square forms of amplitudes +1 or —1 are allowed, with positive 
and negative edges at arbitrary sample times. This is in a sharp contrast with 
PCM/DPCM and Delta-Sigma (Sigma-Delta), where the sampling happens 
at uniformly spaced times. Such an asynchronicity /non-uniform sampling is 
an important tool in any attempt to fight the Nyquist limit on the number of 
sample points vs. spectral support of the signal. 

Initial interest in PWM had been almost entirely driven by special require- 
ments of power electronics. There the only important signal to be represented 
is the primary (lowest) sine harmonics. Typical applications of PWM are 
in power electronics: digitally controlled electric motors, high voltage power 
transmission and control of quality of electrical power grids. 

In simple mathematical terms the basic PWM problem is that of approx- 
imating a sin wave of a given amplitude A: 


Asint on [0, 27] 


by a train of square pulses of +1 or —1 amplitude (independent of A) such 
that spectral characteristic of the periodic pulse train (square waveform) is 
the same as that of a sin wave Asint in the low frequency range. 

Because of the quarter period symmetry, it is enough to consider a square 
form only on [0,7/2]. If one has N square forms to define on [0,7/2] that 


31 


” spectrally approximating” sine wave, one has N free parameters (edges of the 
pulses). Patel in 1970 had been the first to derive an explicit set of equations 
on N pulse edges to eliminate N harmonics in the Fourier expansion of the 
full periodic pulse train (with 4N pulses per full period), see [PH]. It should 
be noted that in view of the quarter period symmetry, only sin(nt) terms are 
left with n odd. The resulting equations have the following form. 

For the quarter-period symmetric (periodic) pulse train f(t) with edges 
O=ap «αι; «... «αν < any) = 7/2 (and f(ao) = -1, f(anyi1) = +1), 

we have the following harmonic expansion 


OO 


f(t)}= Ν᾽ Va-sin(nt), 
where 
4 a | 
Vn = = (-1 Ἐ2: » (—1)‘** cos(na;)). 


Thus for the approximation of the basic waveform 
a, sint 


to within N + 1 first harmonics by the pulse train, the following set of N 
transcendental equations on unknown ας has to be satisfied 


N 
ΘΗΝ cos(na;) Ξξ ορῃ : τ Ξε 1,3,....2Ν --Ἰ 
i=1 


where 


_ J (Fa, 4+1/2) ifn=1 
=) 1/2 ifn>1 


In general, for arbitrary quater-period symmetric harmonic polynomial we 
get the ’system of Classical PWM equations” (in the transcendental form) by 
writing β; = a; for odd i and 3; = π — a; for even 2: 


N 
S © cos(nfi) =Cn:n=1,3,...,2N—1. 


1=1 


The case of an approximation of a single basic waveform (main PWM case) 
leads to the following system of transcendental equations: 


nr n 
) cosa; = a, ) cos (2m — l)a; = 0, m = 2,...,n 
i=1 i=1 


32 


3 Orthogonal Polynomials, Solitons and Classical PWM 
Problem 


The classical PWM problem can be reduced (see below) to an algebraic 
problem of ”odd sums of powers” where one has to determine the set {2;} 
(i = 1...n) from the equations on ”odd sums of powers”: 


Tm 
) emt = 82γι--15) M=1...N. 
1=1 


This and more general OPWM problem are analytically solved using Padé 
approximations and completely integrable isospectral deformation equations. 
We present here an analytical solution, and we describe fast methods of the 
numerical solution needed for practical applications. 

This work on the Classical PWM has a variety of possible practical ap- 
plications. It is a part of an effort at Polytechnic University conducted by 
Dariusz Czarkowski, Ivan Selesnik and authors of this paper. The main em- 
phasis of that effort is on practical implementation of the PWM solution that 
can be useful in power electronics. See a detailed review in [CCCS}. 

A very interesting feature of the general solution to the PWM problem lies 
in its connection to the classical areas of mathematics - symmetric functions, 
orthogonal polynomials, and the theory of completely integrable systems. The 
most surprising relationship is with the Korteweg-de Vries (KdV) hierarchy 
of infinite dimensional Hamiltonians. We show how the complete solution of 
the PWM problem describes, in fact, the class of all rational solutions of KdV 
equations. 


4 Subsequences of Symmetric Functions and zeros of 
Orthogonal Polynomials 


The problem of finding the set of n elements {z;} with given values of n 
arbitrary symmetric functions {57} in x;,4 = 1...n, is in general a very com- 
plicated one because of the nontrivial nature of relations between symmetric 
functions of high degrees. Only in special cases can this problem be reduced to 
an ”exactly solvable” one. For this one looks at the polynomial P(x), whose 


roots {x;} are: 
nm 


P(x) = [[(@ — xi) 
i=1 
This polynomial P(x) is uniquely identifies the set {xz;}. The coefficients of 
P(x) are elementary symmetric functions in αὶ, which have to be determined 
in order to determine the set {x;}. Newton’s formulas for sum of powers of x; 
show that once one knows all first n symmetric functions 


33 


γι 
8) ΞΞ ) αι, J=Hl...n 
i=1 


then one easily finds - by means of Newtons’s linear recurrences - the elemen- 
tary symmetric functions in z; - the coefficients of P(x) - consequently finding 
the set {z;} as the set of roots of P(x). 

The derivation of Newton’s recurrences is simple; it requires a look at the 
logarithmic derivative of P(x): 


P(x) o< 1 
P(x) rs 


i=1 


Expanding the logarithmic derivative at x = oo, we get: 
P'(x) > Sm 
P(z) " ὙΠ 


for Sm = δ... 2;'". Comparing the standard expansion of P(x) at x = 00 
P(x) = x” + aya” + +...4+an-12 + ay 


one gets Newton’s linear recursion relations between a; and s;. 
Newton’s identities explicitly are: 


k 
kay = -δ 81 Οκ--ἰ 
1=1 


Other relations between a; and 8; can be found using the following identity 
(that we will use in later generalizations): 


oc Sm 


P(x) = αἴ ε΄ Domai mat 


Very important combinatorial interpretations of the last identity were used 
extensively by MacMahon and others ([M], [Li]): 


λι A2 A3 
Ν Ν Δι 81 . 85 . 83 soe 
tk = Σ ( 1). 1λι. 2λι. 3A. -- Nite ol λα} - «Ὁ 
λ 
for all partitions A where 1- A, +2-A2+3-A3... = Κ. 


What happens, if instead of classical consecutive sets of symmetric func- 
tions one knows only the values of n non-consecutive symmetric functions? 
The problem poised in the PWM method of power electronics looks first at 
n odd power sum symmetric functions s2,,_; form =1...n. The solution of 
that problem also leads to a sequence of linear recurrences, but this time not 
among the invariants themselves but the sequences of polynomials associated 


34 


with them (these are P,(x) = P(x) as n varies). This solution is based on 
Padé approximations and orthogonal polynomials. 

We start with the general relation between Padé approximations and the 
generalization of Newton relations between power and elementary symmetric 
functions. Let us look at the Padé approximation of the order (n,d) to the 
series g(x) at x = oo. Here the function g(x) is defined via the generating 
function of an arbitrary sequence S,: 


Sm 


g(x) =e. Dome mar | 


The definition of the Padé approximation of the order (n,d) to g(x) at (the 


neighborhood of) x = o¢ is the following - it is a rational function one with 
P,(2) a polynomial of degree n and Qa(z) a polynomial of degree d - such 


that the expansion of anes matches the expansion of g(x) at x = co up to 


the maximal order. This means that 
Py (x) 
Qa(x) 


x” “g(x) = Ο(α *4~") 
Pr(x) — Qa(x)a"~*g(x) = O(a~**). 


After taking the logarithmic derivative of this expression, we end up with 
the following representation of the definition of Padé approximation: 


d 
= — logr”™~4g(x) + O(x "2? ). 
ax 


Now if we write normalized (with the leading coefficient 1) polynomials 
P,(x) and Qq(x) in terms of their roots: 


n d 
Ρ, (4) = [{{ -- 24); Qa(z) = [{(ὦ -- νι), 
i=l k=1 
we get an identification of symmetric functions in x; and yz, with the sequence 


oo Sm 
of S,, in the definition of g(x) = ε΄ ΘΟ τὸν Namely, we get: 


n d 

pd ,J — ς΄ 
Es Sou = 5; 
1=1 k=1 


for 7 = 0...n+d. 

Of course, in the case d = 0 one recovers Newton’s identities. The case 
d = ἢ — the so-called case of the ”diagonal” Padé approximations -- is the 
most interesting case. It is also the case that solves the problem of ”sums of 
odd powers”. This is how it works. Consider the ” anti-symmetric” case when 
y= —x; for? = 1...n and d = n. In this case So,, = 0 for m > 0 and 


35 


Som—1 = 25a2m—1. We thus get Padé approximations of the order (n,n) to the 
following function: 


G(x) = — eo 2 om oda EH ὩΣ: 


The Padé approximants On anes have the property: 


Qn(x) = (-1)"Pr(-z), 


because G(x) satisfies a functional identity: G(—xr) = 1.6 (4). 

This gives us our main result: 

Theorem 1. The solution {z;} (¢ =1...n) to the problem of ”odd sums 
of powers”: 


nmr 

Soap = S9m-1, M=l1...n 

i=1 
is given by the roots of the numerator P,,(r) = [];_,(2 — vi) in the Padé 
approximation problem of order (n,7) to the function 


G(x )= = o72 omoaa maT Me 


Another way to verify this approximation without specialization from the 
case of general sequence 8555 is simply to take the expansion of the logarithmic 


derivative of es y at ὦ = οο. Expanding the logarithmic derivative, and then 


integrating it (formally) in xz, one gets a very simple identity 


(1 ΞΕ = ε 2 moda πε΄ m ja BE 


From this identity Theorem 1 follows. This is a new formula that Newton 
could have found. 
Proof of Theorem 1. First of all, the Padé approximation rational function 


aus (2) of order (n,n) is unique. Then, if ae. is a Padé approximation of order 
(n, n) to G(x), we assume that this representation of the rational function 
is irreducible (i.e., that P,(x) and Q@,(z) are relatively prime). Then ames 
is a Padé approximation of order (n,n) to 1/G(z), and ata is a Padé 
approximation of order (n,n) to G(—z). Because of the functional equation 
G(—x) = 16 (4), and the uniqueness of the Padé approximations, we get 


ae = ΦΈΊΞΞ,. This equation means that Ω,,.(α) = α P,(—z). Moreover, 


since the expansion of G(x) at x = oo starts at 1, we have aa — las 
x — oo. Thus Q,(r) = (-1)" PB, (—2z). Taking into account the ”main” 
identity | 
(5. - ΟΣ as DL 
Ρ,(--“) 


we can see that the right hand side of this identity and the expansion of G(x) 
at x = oo has to agree up to (but not including) z~?"~!. This means that 


36 


we have δὴν 277! = som_1 for πὶ = 1...n. Note that in the definition 


of G(z), the values of som—1 for m > n have no impact on the definition of 
P,(x) because they enter the expansion of G(x) only at x~* for k > 2n +1. 
k m 


This is so because εδυκοι 9#™ "= @ digas BRP O(a~™~*), 

Since we identified the solution to the *sums of odd powers” problem with 
numerator (or denominator) in the (diagonal) Padé approximation problem, 
we infer from the standard theory of continued fraction expansion that ratio- 


nal functions eas are partial fractions in the continued fraction expansion 


of the generating function G(x) at ὦ = oo. This also means (see [S] for these 
and other facts of the theory of continued fraction expansions and orthogonal 
polynomials) that the sequence of polynomials P,(x) is the sequence of or- 
thogonal polynomials (with respect to the weight that is Hilbert transform of 
G(z)), and that the sequence of polynomials P,, (x) satisfies three-term linear 
recurrence relation. Since the same recurrence is satisfied both by numera- 
tors and denominators of the partial fractions, it means that the recurrence 
is satisfied by two sequences - P,(x) and (—1)"- P,(—2x). With the leading 
coefficient of P,(x) is 1, one gets a particularly simple three-term recurrence 
relation among P,,(z): 


Pati(z) = 2- Pr(2)+Cy- Pa-i(2) 


forn=0O.... 


5 Algebraic Problem of Odd Sums of Powers vs the 
‘Transcendental Problem of Sums of Odd Cosines 


The original definition of PWM problem dealt with transcendental equations 
oie1 COS(2m — 1)a; = Com—1, and not algebraic equations δι, 2:27! = 
S2m—1- A very important contribution to the PWM problem by D. Czarkowski 
and I. Selesnick is in the explicit reduction of the transcendental PWM prob- 
lem to the algebraic one for consecutive m = 1,...,n. This is what allows us 
here to use the solution of the algebraic problem of ”’sums of odd powers” for 
the solution of the original transcendental PWM problem. We will show now 
how the transcendental case is explicitly expressed using the introduced no- 
tations of G(x) and P, (x). The basic transformation is cosa; = 2; or in the 
algebraic form: x = (z + z7!)/2, for z = e*. Now, to get from the algebraic 
”*sums of odd powers” solution to the transcendental one, consider the set of 
roots z;, 2;_'; i= 1,...,n. Then ”sums of odd powers” for these roots gives 


> cos (2m — 1)a; for cosa; = 2}. 


wz=1 


This allows us to represent the polynomials P,(x) in z-form. Namely, if we 
define 


37 


γὺ Tm 
1 
P.(2) = [[(@-»), Be) = ΠΕ - 2), 
i=1 | i=1 ‘ 
then we can express P,(z)-P,(z) as a polynomial in z = $(z+1/z) : Pa(z)- 
P;(z) = z"-P,(x). This allows us to express the Padé approximation to G(x) 
(and G(z)) as a similar function of z. We have: 


(yn. Pal) _ _Falz)Po(z) 
‘  Pa(—a) = Pa(—z)P,(—2) 


Expanding each term on the right as a function of z—!, we get 


(—1)"- a — e-? yo Tm/me2™ 


where Ty, = doy, Ζιῖ + 2 ™ = 2D, cosmay. 

This means that the function G(x) (or its Padé approximation) that de- 
termines the solution of the algebraic sums of odd powers” problem (in 2) 
can be reduced to the transcendental ”sums of odd cosines” problem (in 2). 
Specifically, ( (5), as a function of z has the following form: 


OO Cm 


Gla) = ε΄ De oaa 2, 


where the sequence {c,,} arises from the following general transcendental 
”sums of odd cosines” problem 


T™ 
Ν᾿ cos (2m — 1)a; = Com—1, M = l,...,n 


t=1 


corresponding to the algebraic *sums of odd powers” problem: 


with x; = cosa; fori=1,...,n. 

Notice that the ”explicit expression” for ας (a;) or P,(x) simply means 
that the continued fraction expansion of G(x) (in z or z at infinity) is known 
“explicitly” , or equivalently that the coefficients Οὐ, in the main three-term re- 
currence describing P,,(x) are ”explicit” (i.e., classical elementary or transcen- 
dental) functions of n. From this point of view, the main PWM transcendental 
problem 


Tr Tm 
δ cos 0% = a, δ cos (2m — 1)a; = 0, m = 2,...,n 


i=l i=1 


is not ”explicitly solvable” because the corresponding function G(x) (see the 
expression above in terms of z and the corresponding sequence {c}): 


38 


Ga(x) -- eo 4a(z — Vx*-1) 


does not have an ”explicit” continued fraction expansion at x = oo. On the 
other hand, a very similar algebraic problem: 


va) nr 
) ας =a; ) σι 1 =0; m=2,...,n 
i=1 i=1 


does have an ”explicit” solution, since the corresponding function 6 
has a classical continued fraction expansion derived by Euler. The polynomials 
P,,(x) arising in this special algebraic problem are well-known under the name 
of Bessel polynomials. 

An application of the solution of the transcendental ”sums of odd cosines” 
problem in PWM applications lies in is the ability to construct high quality 
digital (step-function) approximations to arbitrary harmonic series. Since the 
proposed algorithm can be executed quite fast on any processor with high- 
performance DSP capabilities, it opens the possibility of better on-the-fly 
construction of arbitrary (analog) waveforms using simple digital logic. 


—2a/ax 


6 Completely Integrable Difference-Differential 
Equations of Korteweg-de Vries type. 


In connection with Padé approximation solution to the "sum of odd pow- 
ers” problem, we can ask what ”nonclassical” objects this solution is built 
from. These objects are soliton equations” and their solutions - isospectral 
deformation equations of the full Korteweg-de Vries (KdV) hierarchy. 

The only parameter left in our solution is the coefficient factor C,, of the 
three-term recurrence for polynomials P,: 


Ph4i(v) = a2- Pr(x) + Cy - Py_-1(2). (1) 


This parameter Οὗ is a function of n and the whole generating sequence 
{Sm :m — odd} of odd symmetric functions. It is C, that is a solution of the 
KdV type hierarchy of completely integrable p.d.e.s in variables s,,. In addi- 
tion to p.d.e.s in s, the parameter Οὗ, satisfy difference-differential equations 
in n and each of s,. 

The formal derivation of the full hierarchy of such equations is based on 
our paper [Ch3]. To see simply how one can derive them we can start with 
(1). Think of x as a spectral parameter: x = Δ. Then (1) is an eigenvalue 
problem for the second order linear difference operator in n. The crucial next 
step is the realization that polynomials P,, also satisfying linear differential 
equations in each of the variables s,,. Once such a differential equation in s, 
is derived, one looks at the consistency condition of this equation and (1). 
Such a consistency condition is a classical isospectral deformation condition. 


39 


This consistency condition implies a nonlinear difference-differential equation 
on (ΟἿ inn and s,, (for any odd m). The resulting nonlinear equation belongs 
to a completely integrable class. Eliminating n for s,, and S,,, one gets a 
KdV type p.d.e. on C,, in 8, and Sm). 

To see how these are derived let us look at the case of m = 1 and the 
variable 51. The corresponding partial differential equation on P,, is: 


2+ Pris, = Pn + En: Pr-1 (2) 


where the parameter FE, does not depend on x. The consistency condition 
between (1) and (2) leads to the following two difference-differential equations 


Cn -- En ᾿ En+13 Cn,s; = £n4+1 — En (3) 


We also summarize here all relationships with Toeplitz determinants in 


the expansion 
Oo 


Ga(z) = Σ᾽ "π 
m=0 


of the approximated generating function of ”odd sums of powers” {sam _1}: 


oo 


G = e7 2 Dom odd mer 


Specifically we use the standard representation of the main and the aux- 
iliary Toeplitz determinants: 


nm _4 | = 
The relationships between C,, and the determinants are the following ones: 


An+1 ° An-1 . 


A2 ; Δὲ -- (—1)" 2 Dn + Dn41. 


Ch = - 


The expression of FE, in the difference-differential equations above is: 


FE? ΝΕ 1),..Ε1 ΄ Dn~1 
DE 
The main object Dy, is expressed in terms of the r-function (typical nota- 
tion for KdV equations): 


where 7, is a polynomial in s2;,,_1. We need, however, to normalize the poly- 
nomial 7,, 80 that the leading power of 81 (and it is grin / *) would have a 


coefficient of 1. In this case we can write more accurately: 


-- 2 
Dry, "Ν ΤῊ ᾿ Tn3 


40 


The rational numbers r,, are easily determined using the following expres- 
sions: 


1 Τ. Tn 1 Τ. .Τη..-- 
E, = -- a +. C, = 5 _ int2 not 
2n— 1 Τῇ Ané—1  T-°Tn41 
This implies rn41 = —r2/((2n—1)*+rp—1). Substituting these expressions 
in equations (3) one can get the difference-differential equation on 7, in n and 
S 1: 
_ 2 
Tn+1,z°Tn-1 — Tn4+1°Tn-1le = (2n " 1) "Th: 


7 Korteweg-de Vries Hierarchy and all that 


Explicit recursion relations defining all commuting (higher) Korteweg-de Vries 
(KdV) flows are well-known. These relations connect the infinite sequence of 
conserved quantities H,, of the original KdV equation - these are the (infinite 
dimensional) Hamiltonians of the higher KdV equations - with the vector 
flows X,, of (m -th) KdV equation. The m-th KdV is: 


6H 
ou 


6H,,/du is the gradient of the functional H,, of u, e.g., for the first KdV 
Hamiltonian (the actual KdV equation), we gave: 


Ut, = Xm(u); for Xm(u) = Oz 


ὁ) 3 2_1 for Ho — | (= = dx. 
5 pu — 5u > for He [Ge + —(u')*) dx 


The flows X,,(u) are commuting (as Poisson structures induced by H,,). 
Recursion connecting successive Hamiltonians are relatively simple: 


5H m 1 5Hm— 


One can also write all higher KdV flows X,, in the explicit form using this 
recursion as follows: | 


Xm = Nv X43 X,(u) = Ur 
using the operator: 
1 
Nu = 59: + 2u+uz,07". 


The first few higher KdV equations thus are: 


41 


15 5 1 
Ut, = wu _ Sulu” _ sul 4 qu; 
One should not confuse x variable in KdV equations with x used in the 
*sums of odd powers” PWM problem. The variable z in the PWM problem 


is the spectral variable, usually denoted as λ in 
= — eo 2 doa HA m2 


and the polynomials, whose roots solve the PWM problem will be denoted 
as P,(A). 

Further, the identification of odd moments s2,,—1 with the canonical vari- 
ables of p.d.e.s in the KdV hierarchy, is the following one: 


XL = §1; 
and for the higher flows ”time variables” t,, we have 


t= S2m—1 
™ (2γη —1)2™-1 


(so the standard KdV time is t = s3/6). 


8 Solution to the PWM Problem via KdV rational 
Solutions 


The relationship between the KdV hierarchy of equations and the general 
”sums of odd powers” problem is very interesting one. Roughly speaking the 
parameter C’, in the recurrence relation defining the orthogonal polynomials 
in the ’sums of odd powers” problem, as functions of odd moments samn—1 
satisfy all p.d.e.s in the KdV hierarchy, with x = 61 and higher flows ”time 
variables” ἐμ, being tm = της ὌὌσπετ. 

Moreover — and this is what distinguishes the ”sums of odd powers” case, 
and completely characterizes it in terms of KdV equations - rational solu- 
tions to the KdV equation and to the full KdV hierarchy are com- 
pletely described by the solution to ’sums of odd powers” problem. 

The class of all rational solutions to KdV, well-studied since 1977, and 
still under great deal of investigation today, has many interesting and impor- 
tant properties - this class is a limit case of the famous N-soliton solutions 
corresponding to special rational curves; it has a famous many-particle in- 
terpretation in terms of dynamics of poles of these solutions in x- (and t,,-) 
planes, etc. 

The specific KdV relationship is the following. We look, as above, at the 
orthogonal polynomials FP,,(A) representing the Padé approximants to the gen- 
erating function G of the sequence of odd moments som _1: 


=e Ne oaa RT mx 


42 


and satisfying the three-term recurrence 
Pazil(A) = A+ Pr(A) + Ch: Pa-1(). 


”Explicit” expressions for P,(A) and C;,, involve, as above, Toeplitz deter- 
minants in the coefficients c,, of the expansion of G at A = oo: 


Cm 
Ga(A) = a 


m=0 


Specifically we use the standard representation of the main Toeplitz de- 
terminant (as above): 
Dn = det (ci+5) 2520: 
There is also the auxiliary Toeplitz determinant A,: 


A, = det (ci4.j41)¢ 520: 
The relationships between C,, and the determinants are the following ones: 


An+1 ° An-1 . 


Cr, = -- ΔῈ ’ 


ΔῈ = (-1)"2Dy- Dn41. 


The main parameter D,, is expressed in terms of the 7-function of KdV- 
type equations: 
Dyn = τί, 
where 7, is a polynomial in sgy— 1. 


The KdV solutions are expressed in terms of the potential u, very simply 
related to 7 as follows: 


Un = —202(logt) = —O0?(log Dn). 


(Here as above x = s}). 
The relationship between KdV hierarchy and the ”sums of odd powers” 
recurrences is the following one: 


Theorem 2. If the generating sequence of odd moments {52m _1} is con- 
sidered as a sequence of independent variables, then with the identification 
X= 81, andtm = moat» the 7 functions tT, = WD, are all rational 
solutions of KdV equation (and all commuting higher KdV equations). The 
polynomials 7, are characterized by the their degree in zx - which is n(n) | 


An important consequence of this theorem is the characterization for the 
first time of the full manifold of rational solutions as explicit functions of 


43 


actual higher KdV natural parameters t,, (see above the identification t,, = 
impart with odd moments variables). This completes the study of rational 
solutions of KdV that we started more than 20 years ago [Ch4]. 

We write down explicitly the first few 7,, normalizing them (for unique- 


ness) with the coefficient at 2”("—))/2 being 1 (remember that x = 81): 
T2 = 561; 


73 = 8. — 53: 
ΤΆΔ ΞΞ 80 — 58°83 + 552 + 95:88; 
Ts = 81° — 155185 — 1755153 + 683(557553 + 8° — 385)s5 + 225(s3 -- 57); 


The recurrence that these normalized polynomials 7, satisfy is a known 
one (it also follows from the difference-differential equation on C, in n and 
81}, that contains a crucial ambiguity, hiding in constants of integration the 
explicit dependency on s;: 


2 
Tn+1,2°Tn—-1 —~ Tn4+1°Tn-1,2 = (2n — 1) Th: 


Theorem 2 and the direct relation to the Padé approximation problem to G 
provides a much simpler theory of rational solutions to the KdV hierarchy than 
all other descriptions (we refer to [AMM] and [Ch4] for an original exposition, 
and [2] for the modern presentation and review). 

An interesting corollary of Theorem 2 is that the polynomials D, (τη) and 
rational functions C;,, are derived by means of successive Backlund transfor- 
mations from the initial n = 0 (u = 0) case. 


9 New Difference - Differential Equations In 
Nonintegrable Cases - PWM Examples. 


Whenever the function G that is expanded into its continued fraction, does 
not satisfy a Riccatti equation over C(x), there is no simple Painlevé equa- 
tion/recurrence on partial quotients C;,. Most G fall into this category and 
examples of G from PWM problems are not ”integrable” either. The most 
interesting example of G depends on the parameter a (voltage level): 


G= oe 4a(z—-Vv2*—1) 


expanded at x = oo. The corresponding partial quotient C, = C,(a) has to 
be determined as a function of a in order to compute solutions P,(x) of the 
main PWM problem. We show that Οὐ, (α), though not a Painlevé function, 
satisfies an algebraic difference -differential equation in n and a. 

We start with the definition of Padé approximation to G: 


Qn ° G πη = Ο(α τ 1), 


44 


where gn = Qn(Z), Pn = Pn(x) are polynomials of degree n in x, and p,(x) = 
(—1)"qn(—zx). From the expansion of G at x = oo one gets leading coefficients 
of dn: Qn = 2" Ὁ αὐ + qanz”* +..., where g2n41 = Gan + Ci, OF Gan = 
ye C;. G satisfies a linear p.d.e. over Q(z, a): 


L-G, =4aG for L = (x? —1)-dz —a- 2-day. 
We can differentiate the definition of the Padé approximation, and get: 
Qi) -G— Pi = O(«~"); 
QM = (a? — Lane — anja + 4agn; PO = (x? -- 1)paje — A2Pna- 


Because of orthogonality properties of q,(x), we can express (1) asa 
linear combination of only a few of q,,’s: 


QD = an: dn-1 + 24°dn +N 2° Qn: 


Here we define an expression built from previous (γε: 


n—1 n—-1 
On =-2- S$ Οἱ --αὐὖ Οἱ + 20? — π. 


As a result we have a linear partial difference-differential equation on gn 
in n,a and x. This equation is compatible with the original three-term linear 
recurrence on gn. The consistency condition becomes our new equation on 
C,,(a). To represent this consistency condition in a transparent form we use 2 x 
2. matrices and a vector of two consecutive polynomials and the corresponding 


shift operator +: 
--«-[( a ἡ. a_i Mtl 
4 (4) _ ( dn ) 


Then we have the original recurrence written as 


=+ _ a _ (Cn 
4 = M, qd; Mn = ({ 5) 


We can write the expression of Qs) as follows: 


Ε 7 nx — 2a On 
LQ) = Κα τᾷ, Kn = Cane (n — 1) — 2a -- On—1/Cr-1 .) 


The compatibility condition now becomes 
I(Mn) + Mn- Kn = (Kn)t > Mn 


Using this definition of a, in the first equation we can write a single 
differential equation in a on C,, that involves only C,_; and Cm form < n—1: 


45 


Chia — —En-1 Ch + on 
a 
for 
An-1 
γῃ --- ΞΞ - 2 . 
En-1 (G— + 2) 


Its solution can be written in quadratures as follows: 
_ α 

Cr = (ε [5.-1γ. ([ % ved 6-1), 
a 


The initial condition on Οὐ as a function of a is: Cy |g=o= —1/4. 

A more detailed algebraic analysis of C;, as a rational function of a re- 
veals more algebraic structure not dissimilar to that of solutions of Painlevé 
equations. The canonical determinant is 


A, = det (ci+.541) 2520 
in G = > 9 cn/x”. In these notations we have 


An+1An-1 
Ch -- ΠῚ Δ ἡ 


Canonical polynomials w,, in a: 
deg wn = (n+ 1)*/2 
for odd n, 
degw, = mot?) 2) 


for even n. 
In terms of polynomials wy, we get the following expressions 


Wk—-3°Wk 
οι = Uk .--.-.. 
Wk—-2°*Wk—-1 
An = --οοηδύῃ *Wn—1*Wn—2° a”. 
Nk 2k+3 
Τὺκ- 1 2k --Ἰ 


We can write then 
od &m = gata, Wn 
Wn—2 
We can get the following expression for w, (and C;,) in terms of w,, for 
m<n: 


[Ξ .αὐῊΣ.. πο’ da + Const = nz - a2"+? ὦ 
a Wn—3 Wn —2 


These differential equations or integral representations allow for a simple 
recursive computation of C;,(a) as a rational function from Q(a). 


46 


10 Complexity of Computations of PWM polynomials 
and solutions 


What is the complexity of computations of PWM polynomials and their roots 
- j.e., of the solution to the classical PWM problem? One can ask the same 
question about all n first sums of powers. If one uses just Newton identities, 
the complexity is O(n), but a much faster scheme can be found. The key to 
this is: 

P(x) = σα ε΄ dma τιάθπι 


Indeed, according to Brent’s theorem N terms of the power series expansion 
of εἶ (5) can be computed in only O(N log N) steps from the power series 
expansion of V(r) (see [K1], Sec. 4.7, Ex. 4.) This algorithm requires only use 
of the FF T-technique for computation of fast convolution. 

Similar complexity considerations can be applied to the problem of fast 
computation of polynomials P,,(x) that give the solution to the classical PWM 
problem of consecutive odd power sums. In fact, the arguments presented in 
this and the following sections are applied without significant changes to the 
solution of the general OPWM problem that uses two-point Padé approxi- 
mations. A classical Levinson algorithm of solving systems of Toeplitz linear 
equations, or algorithms based on three-term recurrence relations satisfied by 
Pm(xz) give the complexity of computations of (all coefficients of) P,(x) as 
O(n7). These algorithms are, perhaps, the best in the range of moderate n 
because of their simple nature and the fact that they use almost no additional 
space. Also the O(n”) complexity algorithm provides with the determination 
of not only the single P,\z) but all P,(xz) for m < n. For large n these 
algorithms became impractical. Thus one needs to use fast algorithms. 

This is how a fast algorithm of computations of (all coefficients of) P,(z) 
of the total complexity of O(n log? n) works. First, one has to apply Brent’s 
theorem to compute O(n) terms of the power series expansion (at infinity 
x = co) of 


G(x) — o72 Dom odd mam ; 


from the first O(n) known coefficients s,, with the complexity of only 
O(n log n). Then one has to use fast Padé approximation algorithms (or, equiv- 
alent fast polynomiai ged algorithms). There are a variety of these algorithms, 
with the most popular from [BGY]. Its complexity is O(n log? n). Thus we 
can compute P,,(x) in at most O(n log* n) operations. Of course, this method 
should be used only for a large n (with additional precision of calculations) 
Since it relies on a variety of extensions of FFT methods. 


47 


11 A Simple Algorithm for Orthogonal Polynomials 
Computations 


A simple algorithm of computation of all P,, (x) for allm =0,...,n , having 
the complexity of O(n”), that we mentioned above, is easy to describe. Let us 
look at the expansion of G(x) at x = oo, written in the following form: 


G(x) = S0(-1)¥ - «75. 
k=0 


By Theorem 1 polynomials P,(x) are defined from the Padé approximation 
problem to G(x), i.e., the remainder function 


Rn(xz) = Pa(—£)+ G(x) — (—1)"- Ρ, (α) 
has the following expansion at z = 00: Rn(x) = Ο( --τἶ τ). Also, since P,(x) 
are orthogonal polynomials, they satisfy the three-term recurrence 


Pr+i(@) = x2- P(x) ἘΦ, Ph—1(z) 


forn =Q.... The initial conditions can be chosen here as: P_| = 1, Po = 1. 
Let us write P,(z) in terms of its coefficients: P,(x) = Στ Pai: a. 

If P,-1(x) and P,(x) are known, then in order to determine the single 
unknown C,,, and thus P,41(x) (via the recurrence), we have to look at the 
coefficient at +~” of R,(x). Assume as before, that the leading coefficient of 
P,(x) is 1, ie., Pan = 1. Looking at the coefficient at ~” in the expansion 


of R,,(x) we get the following expression for C,,: 


Tt 
di πο πω)" Cnt541 


γι--Ἰ ° 
δ ποι," Cnti 


Once C;, is determined, the coefficients P,41,; are easily determined recur- 
Sively. 

In order for this algorithm to work, one needs coefficients c; in the expan- 
sion of G(x). As we mentioned above, the complexity of Brent’s algorithm of 
computations of c; up to O(n) that uses FFT is O(nlogn). 

In fact often the complexity is bounded only by O(n). For example, in the 
most interesting case of the PWM problem: 


Chr = 


G(x) = Ga(x) = en 49@- VED 


the algorithm for computing c; is a very simple one of complexity O(n) only. 
This algorithm follows the authors’ general power series algorithms [Ch5]. 
The key to this algorithm is to notice that G(x) satisfies second order lin- 
ear differential equation (with singularities at x = —1, 1,00 and an apparent 
singularity at x = 0): 


48 


μ΄ τ(α2 -- 1) + y’(8ax(x? — 1) +1) + y(4a(1 — 4azr)) = 0. 


If we look at the expansion of Ga(z) at x = 00: Ga(x) = S774 {oy then 
we get the 4-term recurrence on Cp: 


Cnt2 = πατίταγ ᾿ (Cnti((n + 1)? + (n + 1) — 16a”) + en(8an + 4a) + 
Cn—1(—(n — 1)2 — 2(n — 1))). 


The initial conditions are c,, = 0 for n < 0 and 
Co = 1, αι = 2a, Co = 2α“..... 


From these values and the recurrence for c,, one derives the Οὗ, factors in the 
three-term recurrence for orthogonal polynomials P,,(x). Here are a few initial 
Ch: 
4a? — 3 Co, - P= 60a* + 16a* 

12° ~*~ 60(4a? — 3) 


Here all C, are rational functions in a of rather special structure. Since the 
case of the continued fraction expansion of G(x) is not ” explicitly solvable”, 
Cr is not a ”known” function of a and n. An important consequence of the 
”unsolvability” of Οὐ, is the growth of coefficients of C;, as rational functions 
in a with integer coefficients. According to standard conjectures about explicit 
and non-explicit continued fraction expansions (see [Ch1]), the coefficients of 


Co = —a, (ΟἹ) ΞΞ 


Cy, as the rational function in a over Z are growing as e2(*) for large n. In 
fact, for n > 12, the coefficients of C,, in a are large integers. This makes 
it impractical to precompute with full accuracy C, for large n. It is also 
unnecessary to analytically determine C,, = C,(a) explicitly, since we need to 
know C;,(a) only in the range of a that is significant for applications - this is 
the range where the weight of orthogonal polynomials P,,(z) is real. 

Let us show explicitly P9(x) = Po(x,a) the polynomial: 


(— 1099511627776 a4} + 49478023249920 a* x + 
3809807790243840 αὖϑ x (—3 + 4.52) — 272129127874560 a9 (—1 + 427) -- 
714338960670720 αἴ (43 — 328 x? + 208 x*) + 714338960670720 a*® x (1685 — 
4280 x? + 1456 x*) + 139296097330790400 a*4 x (—551 + 2005 x? — 1316 x24 + 
128 x°) — 11608008110899200 a®°(—181 + 198027 — 240027 + 4482°) — 
4673280144593651143490437500000 a? x (3— 45 17+ 180 24 — 264 1® + 128 x8) + 
438120013555654794702228515625 x (5 — 8027 + 336 x* — 512 x° + 256 2°) — 
438120013555654794702228515625 a (1 — 40 x? + 240 27 — 448 27° + 256 8) + 
2336640072296825571745218750000 a? (1 — 40 x? + 240 x4 — 448 x® + 256 5) + 
34824024332697600 a*? x (95319 —442680 x?+420084 x14 — 82368 7©+ 1088 x8) — 
2742391916199936000 a®° x (37688 — 2098332? + 25588024 — 752962° + 
2176 “8) — 17412012166348800 α (5605 — 78334 x? + 136328 x4 — 49952 “ + 
2176 x®) + 311552009639576742899362500000 a4 x (123 — 1732 x? + 6544 x4 — 
9120 2° + 42242) — 311552009639576742899362500000 a° (17 — 67227 + 


49 


4000 «* — 7424 x® + 4224 x) — 118686479862695902056900000000 αὖ x (499 — 
6630 x? + 23560 x* — 30816 r® + 13376 7°) + 274239191619993600 a?! (11921 — 
19992027 2+ = 444760 x4 — 2388802 4+ 217602°) + 
59343239931347951028450000000a7(115 - 44442? + 2598427 
A75522° + 2675225) +  5656183327162368000 a?8 x (424359 
272379627 + 4003184274 -ὀ 156108828 + 739842%) 
7912431990846393470460000000 a® x (7341 — 9259222 + 30953627 
376640 “9 + 150016 x°) — 7912431990846393470460000000 a9 (715 — 26788 x? 
152384 27 — 272256 2° + 150016 2°) — 1885394442387456000 a?9 (43413 
84006022 Ὁ 224313624 = =9—)—Ss«156505625 + 221952 2%) 
1918165331114277204960000000 a!° x (20157 — 2423682? + 763992 x4 
860800 z° + 3080962°) + 959082665557138602480000000 a! (3375 
12187222 + 66896024 — 115500825 + 61619228) 
29978652305571121152000000 a!4 x (211983 — 23204802? + 652434024 — 
6264792 2° + 1704864x7°) — 4342888260886855680000 a2? x (1424718 
1227132527 + 2598097274 — 169348807 + 207808028) + 
358288281523165593600000 α΄ (48491 — 144329822 + 642214024 
87298122° + 34907122°) + 11241994614589170432000000 α΄ (36587 — 
12109927 - 6064752274 - 94878082 + 45463042°) -“-- 
2828091663581184000 αὖθ (15025965 — 108265340 x? + 18422728824 — 
88762560 7° + 6125440278) + 1414045831790592000 a2? (1107681 — 
2407989627 + 7419433624 — 629260807 + 122508802°) + 
37118703084503040000 a?4 x (15659431 — 124251072x? + 238180224274 — 
135547008 “Ὁ + 12747008 x®) + 47025336949915484 1600000 a1 x (3532305 — 


| |r lt 


3675547022 + 9731934824 -— 859241602 + 1979616028) + 
361907355073904640000 a2 (743683 — 1935194422 + 7380844824 — 
82706240.x® + 24936960 x8) — 44786035190395699200000 a!® x (7416360 — 
7299257722 + 18102355624 — 14610648025 + 2792569628) + 


2810498653647292608000000 a!? x (6507637 — 74700980 x? + 222417104 24 — 
231548736 “Ὁ + 72980736 2°) — 2810498653647292608000000 a!3 (475795 — 


1648401627 + 8670164824 -- 14321171275 + 729807362°) — 
117563342374788710400000 a!” (818493 — 2576209827 + 12205516824 — 
178987040 x°® + 791846402°) -- 5302671869214720000 α (4380839 — 
10495005227 + 36326416024 — 3592059522° + 892290562°) + 


1357152581527142400000 a?° x (37857867 — 350070756 x? + 806628912 α΄ — 
589406592 r° + 91422464 7°) — 271430516305428480000 a?! (9003483 — 
251665140 x? + 1042516880 x4 — 1296841600 x® + 457112320 x8)) / 


(8821612800 (12714083695698776015625 ~— 67808446377060138750000 a2 + 
149178582029532305250000 a4 - 179961464035626273000000 a® + 
134554919202394891200000 a8 - 66992179236769987200000 a!” + 
23251106676342504960000 a?? - 5793671320995317760000 a!4 + 
1055273129097707520000 a!® — 141774665486598144000 al + 
14064801509670912000 αὖ -- 1023040734363648000 αΖ2 Ἔ 


53635589760614400 a24 -- 1963734545203200 a2® + 47436571607040 α28 
676457349120 a*° + 4294967296 α2)) 


50 


12 Fast Algorithms For Computing Solutions to PWM 
Problems 


We already know that while the slow algorithms of computation of P,,(a) can 
be completed in O(n”) operations, the fast FFT-algorithms can be completed 
in O(nlog? n) operations. After P,(x) is determined, we need, in addition, 
determine the set {55} of all roots of P,(x). One can use general methods of 
computations of roots of univariant polynomials; that would bring the overall 
minimal complexity higher. We do not need to do it in our case because 
orthogonality properties of P,(x) allows one to have fast and numerically 
stable methods of computing {x;}. We present one such algorithm, suitable 
for moderate and large range of n. 

Fast polynomial root finding requires first root separation” - determina- 
tion of domains in the complex plane containing only one root. Next, individ- 
ual roots are ” polished”, i.e., computed with the high accuracy, usually using 
Newton-type iterative algorithms, Both parts of this program of polynomial 
root- finding can be very well accomplished for polynomials P,(2) in PWM- 
problems relevant for applications. | 

In all practically relevant cases of the PWM transcendental ”sums of odd 
cosines” problem, x; = cosa; fori = 1,...,n and real ας. This means that 
x, are real and lie in the interval [—1,1] for all π᾿. Because of the orthogo- 
nality properties of P,(x), we get the Sturm property: the roots of P,_1(z) 
separate the roots of P,(x). This provides a variety of methods of separating 
the roots of any given P,,(x). The simplest method of the overall complexity 
of O(n? log? n), is to compute roots of all P(x) for all m = 0,...,7 recur- 
sively, using roots of the previous polynomial to separate and then accurately 
compute the roots of the next P,,(x) polynomial. Using fast algorithms of 
computations of Padé approximations this method can be improved to the 
overall complexity of only O(n log” n) operations for computing the set {51} 
of all roots of a polynomial P,,(z). 

For these fast algorithms we use fast polynomial evaluation: for any given 
set of N points {X;} and a polynomial P(X) of degree N in X, one needs at 
most O(N log? NV) operations to evaluate P(X;) for alli = 1,...,N. Using 
this technique one can rapidly find true roots X; of the polynomial P(X), 
starting with approximate (but well-separated) values Xo, of these roots. In 
the case of single roots (and for all physically relevant PWM problems roots 
are single) one uses a classical Newton-Raphson method. 

For orthogonal polynomials P,,(x), arising from the algebraic version of 
PWM problem, we can use fast algorithms of evaluation at n points of P, (2) 
and P(x) to get n Newton-Raphson approximations running at the same 
time: P, (a) 

n\2i . 
Li = Uj Ῥιαὴ) 


= 


51 


To get rapid (geometric) rate of convergence of this algorithm one needs initial 
conditions of x;, corresponding to centers of intervals separating the roots of 
P, (2). 

A fast algorithm arises when we start directly from the initial approxima- 
tion to x; for the given P,(xz). Such approximation can be rigorously derived 
using the classical properties of orthogonal polynomials (see [S]). Roughly, the 
leading term in 1/n-expansion of (properly ordered) roots x; of P, (zx) is given 
by asymptotically as follows: 


πὶ. 
Τὶ = coOSa;, Qj ~ π᾿’ 27=1,...,n. 


More accurately, a; are separated by O(1/n); if these angles are ordered, then 
A/n < αἱ — ai41 < B/n. Thus one can chose O(n/e) total starting points 
x in the Newton-Raphson iterations with the property that any real root z; 
is within the distance e/n from at least one starting point x of the iteration. 
In the case of a fixed machine precision the number of iteration is constant, 
providing us with the fast algorithm of computing the set {2} with at most 
O(n log? n) operations. 


13 Solvable Extensions of the Classical PWM Problem 


A variety of extensions of the ”odd sums of powers” problem can be solved 
using Padé approximation techniques. For this one uses methods of generalized 
graded Padé approximations developed by authors. Some of these problems 
also arise in practical applications of signal processing (phase unwrapping, 
channel identification, etc.). A particular example of the problem, generalizing 
the ”odd sums of powers” is the problem where for a given n and N > 1 one 
knows n first consecutive sums of powers of {55} (¢ = 1,...,n) but every 
N-th one (i.e., 1,..., NM —1,N+1,...). This problem is analytically solved 
using simultaneous Padé approximations to N — 1 functions in a way almost 
identical to the one presented above for N = 2. 


14 The complete solution to the OPWM Problem 


The main ingredients of the solution to the OPWM problem are: 

1) Analytic solution of the underlying approximation problem by reduction 
to the problem of two-point Padé approximation and orthogonal polynomials 
on the unit circle; 

2) Determination of the variational part of the OPWM problem (the de- 
pendency on the amplitudes of different frequency bands) in terms of the com- 
pletely integrable commuting Hamiltonian flows of Sin-Gordon and Korteweg- 
de Vries type. 


52 


14.1 The analytic approach to the OPWM Problem. 


Let us look at the periodic +1 pulse train f(t) with the 2N pulses in the 
fundamental period [0,27] and the corresponding pulse edges at a;: 


0=agp < a, < Q2 <<... < Qon41 < Q2enN42 = 27 
(normalized at f(0) = —1, f(27) = 1). Let us look at the Fourier expansion 
of f(t): 
f(t) = bo + δ᾽ an sinnt + b, cos nt. 


n=1 
Let us define now the (complex) Fourier coefficient c, of f(t): 


1 2π 


Ch = θη ΕἸ απ; Cn = -- f(te’"*dt 
WT JO 


for n > 0. Then the Fourier coefficients in the expansion of pulse train f(t) 
have the following form: 


N N+1 
πῦρ = ) a2; -- S Qoj;-1 + 7; 
j=l j=l 


N+1 
—21 


N 
Cn -- {S elt O25 ΜΗ ) οἶα 27--ἰ + 1}; 
7ὶ ; : 
j=l j=l 


for n > 0. 
To put this expansion in a more symmetric form, we associate points 2) 
on the unit circle with the phases (angles) a;: 


zj = εἶτ forj=1,...,2N +1. 


This allows us to represent the Fourier coefficients bo, Cn of f(t) in the invariant 
way in terms of z;: 


N+1 
. 2095. 
—imbo __ ΠΠ᾽..1 272-1 
~e = NT? 
[7:1 223 
πηπὶ N+1 N 
nr n 
j=1 j=l 


(for n > 0). 

Finally we can define the main objects associated with the OPWM prob- 
lem. Theses are the polynomials Qn 41(z), Px (2), whose roots are positive 
(respectively, negative) edges of the pulses in f(t): 


N+1 


Qn4i(z) = [] @ -- 22;-1): 


j=l 


N 
Pn(z) = Ie — 29;). 


From these polynomials a rational 2-point approximation exes) (at z = 0 
and at z = oo) is formed. The following expansions of this rational function 
is crucial for spectral and orthogonal polynomial interpretation: 


Qn+1(z) ΞΞ Ζ2.ο Mai ar at z = co. 
Py (z) 


Here we define the sequence of complex numbers {s,7} in terms of the Fourier 
expansion {Caz}: 


_—Mri 
SM = 5 ομ +1 forM =1,.... 


If we denote by P—(z) the reciprocal of a polynomial P(z) (with a leading 
coefficient one), then we have the second expansion formula of the rational 
approximation: 


ΩΝ ,1(:) ΝΟ SM | 
--Ξ------ = 2.6 M=1 Mz atz = X. 
Py (2) 


Here δ; is the complex conjugate to s;,. These expansions follow from New- 
ton’s original formula of the sums of powers symmetric functions, and repre- 
sent the expansions at z = O and at z = oo of the rational function pe) 
One has only to add to it the definition of bp in terms of constant coefficients 


of the polynomials Qn +1(z), Pn(z) (given above): 


40 -- e770. 
Po 
where qo, py are, respectively, constant coefficients of Qnii(z), Pn(z). 

This set of (relatively) classical identities allow us to represent the general 
OPWM problem of representation of an arbitrary harmonic waveform ~(t) in 
terms of the (periodic) +1 pulse train as the problem of 2-point Padé approx- 
imation (or polynomials, orthogonal on the unit circle). Namely, for harmonic 
polynomial y(t) with 2N + 1 arbitrary coefficients B;,j7 = 0,...,N; An,j = 
1,...,N: 

N 
w(t) = Bot δ᾽ An sinnt + B, cos nt. 
n=1 
we have the theorem characterizing the +1 pulse trains with edges a; that 
have the same leading 2N + 1 coefficients in the Fourier expansion: 


53 


54 


Theorem. In order for the periodic pulse f(t), as above, to be a harmonic 
approximation to the waveform 7(t) to within the first 2N+1 harmonic terms, 
it is necessary and sufficient that the rational function 


Qn+i(2) 
Py (z) 


with Qn4i(z) = jan (2 — 293-1); Pn(z) = ja (2 — 22;), be a solution 
to the following 2-point Padé approximation problem: 


ὧν ει(3) = Zz: Ζ χ᾽ Νὴ δι: -- οὦ 
Prlzy 7 21 Sool) + O(2-™) at 
eee = fo(z) + O(2%t') atz =0 


where the functions f(z), fo(z) are conjugate” generating functions of 
Fourier coefficients of w(t): | 


foo(z) ΞΞ 6 Μ-ι MzM: 
Su = 5 (Βμ εἰ: Am) +1, M =1....: 
CO S* z 
fo(z) = εἰ 2uma0 “HT; 


Mme 
S* = - πίβρ, St, = τ (Bu ~i- Am) +1, M=1,.... 


This particular class of Padé approximations is closely related to the clas- 
sical class of orthogonal polynomials - polynomials orthogonal on the circle, 
studied by Szego and others. They posses many important properties, one of 
which is of particular use in computations is the existence of the three-term 
recurrence relation, connecting the polynomials Qn41(z), ΡΝ (2) for consecu- 
tive N. 

This naturally assumes that we have a ”full” harmonic expansion of the 
basic waveform a(t), looking at its first 2N +1 coefficients, defining the ” pulse 
train approximation of the order Ν᾽" - f(t) with N+ 1 up-ticks aaq;-1,j = 
1,...,N+1. 

The three-term recurrences satisfied by the polynomials Qy+1(z) for N = 
0,... and by the polynomials Py (z) for N = 0,... is the following one familiar 
from the theory of polynomials orthogonal on the unit circle: 


XN41 = (z + Dn): Xn +2z-En-XnNn_1 
(satisfied by Χν = Qn4i(z) and by Xy = Py(z)). An important interpre- 


tation of this three-term recurrence is the difference spectral problem of the 
second order (on Xy), where z plays a role of the spectral parameter (in 


55 


the complex plane). Such interpretation allows us to use a Riemann boundary 
value problem and after the reformulation of the Padé approximation problem 
as the matrix (here, 2 x 2) factorization problem, to get new numerical meth- 
ods of solving it. We can determine the polynomials Qy+1(z), Pn (2) (staring 
from the original Fourier coefficients Cy of y(t)) in only O(N - log N) opera- 
tions. The roots z; (and thus a;, whenever they are real) can be determined 
in O(N - log? N) operations. 

We present several simple examples of OPWM approximations to trigno- 
metric polynomials. Graphs show the trigonometric polynomial w(t) and the 


corresponding pulse train f(t) on [0, 27]. 
ln = AT “πῃ 


1 D 
-0.5 || 


1 
Fig. 1. v(t) = 1/2 + (8sin(t))/(27), f(t), N = 24 


14.2 Soliton solution of the OPWM Problem 


The most interesting, at least in the mathematical sense, thing about the 
OPWM solution is its deep relationship to completely integrable Hamiltonian 
(classical and quantum) systems of the isospectral origin. This relationship 
provides a different characterization of pulse edges in terms of the discrete 
spectrum information in the scattering matrix, associated with soliton type 
solutions. The appearance of an infinite family of commuting Hamiltonian 
flows (representing first integrals of the completely integrable Hamiltonian) is 
natural, because the OPWM problem depends on Fourier coefficients ας, ὃ; as 
independent variables (representing commuting time flows). 


56 


| | I 
ΠΝ: 


] | " | 
, 53 ) 
-1| 


Fig. 2. φᾷ) = —(6sin(t) + 6sin(2t) + 4sin(3t) + 3sin(5t))/(6m), f(t), N = 836 


HN Ι Ϊ NI β 
7 | | 
4! Il 


Fig. 3. ~(t) = (60sin(t) + 60sin(2t) — 70sin(3t) + 15sin(4t) — 18sin(5t))/(607), 
f(t), N = 36 


1 
0.5 
-0.5 


57 


The first example of such a soliton” representation of OPWM solution 
that we studied arose from the hierarchy of Korteweg- de Vries equations. ‘The 
KdV case arises from the quarter period waveforms that have odd sine terms 
only: 


2N-1 


w(t) = δ᾽ Ay sin(nt) 


n=1,n—odd 


The corresponding moment problem in the algebraic form is a problem of 
“the sum of odd powers” 


2m—1 
2m—1 . -- 
Σ τς = S9m_-1:mM=]1,...,N. 


1=1 


The identification of time variables t,, of m-th order KdV Hamiltonians 
H,, with free parameters is relatively simple 


ΝΕ S2m—1 . ΝΕ 
tm = (Qm— 127-1 7: 77) = 1, 2, 


where t; = x is the spatial variable of the KdV equation. 

The fact that the variation of the PWM problem in this case is subject to 
KdV equations, allows us to derive various approximations to specific classes 
of PWM expansions. For example, in the most important PWM case of a 
pure sine wave of an arbitrary amplitude a , one gets a difference differential 
equation in n and αι, that can be used for an efficient and rapid numerical 
integration. This and other consequences of solition studies allow for fast 
and stable numerical methods of computation of pulse edges a; in the PWM 
problem even for very large values of NV. 

Without this additional information the case of large values of N had 
been considered impractical to study using conventional techniques of numer- 
ical solutions of a system of N simultaneous transcendental equations in N 
unknowns ας. The reasons for the difficulties include: ill conditioning of the 
Jacobian matrices in the neighborhood of a generic point in the phase space 
(these matrices are of Hankel type and resemble famous Hilbert matrices). 
Straightforward applications of conventional computations of Padé approxi- 
mants is also unfeasible for large N, due to the presence of a very large number 
of singular values in the corresponding Hankel matrices. Only because of the 
control of commuting difference and differential operators, whose eigenfunc- 
tions are the polynomials P,,(x) in PWM problem, a stable determination of 
polynomials P,(x) and their zeroes x; = cosa; is possible for moderate and 
large N. 


58 


15 The Boundary Value Riemann Problem 


The key for the solution of the general Padé approximation problem, as well 
as to the problem of its isospectral deformation is provided by boundary value 
problems, and by integral equations associated with them. 

A classical complex variables boundary value problem is called the Rie- 
mann problem. It can be formulated in the following way. Let I” be a closed 
contour (possibly, containing oo) in the complex plane λ and let G(A) be a 
matrix function (of order m) defined over I’. We need to find a matrix func- 
tion ~(A) analytic inside the contour, and a matrix function wWe(A) analytic 
outside the contour, and 7%, and w2 on the contour satisfy the following con- 
dition 


w1v2 = Gi). 


Solution of a regular Riemann can be reduced to solution of a system of 
singular integral equations on the contour I’. Let the Riemann problem be 
normalized at a point Ag e.g. outside the contour I" and let 


p2(ro) = 9. 
We are looking for the solution of the Riemann problem in the form 


φί(ξ) 


vp  =h+ EN 


dg 


inside the contour, and 


vl) 
»ξ--λ 


outside the contour. Then the normalization condition yields 


_ ff φίξ) 
[ ξ -- 20 ἀξ 


Thus we arrive at a singular integral equation on ¢ for λε I 


φ(ξ) 
γξ-- λο 


y2=h+ dg 


——-— d€ + mip(A)T (A) ἘΠ: τ τα = = 0, 


where 7 = Gar The solution of this integral equation solves the (regular) 
Riemann problem 


59 


16 Isospectral Deformation Equations 


The appropriate isospectral deformation equations arising from the general 
OPWM problem is the same one that appears in the chiral σ᾽ - model, and, 
in particular, in Sin - Gordon equation. These equations like other isospectral 
deformation equations arise from the consistency conditions between (matrix) 
differential operators, depending on a parameter ἃ (typically called a spectral 
parameter). The general form of these consistency type differential spectral 
problems is 


(ξξ _U(A))p =0, 


ξ ~V(A))p = 0, 


where u = U(A),V = V(A) are m x m matrices, rationally depending on 
the (spectral) parameter A. The consistency condition for these equations is 
known as a ” Lax” pair and is 


δ. ὃ - 
50 ~ σεν + WV =0. 


This is actually a system of nonlinear partial differential equations on 
residues of U(A), V(A) at their poles independent of 7, €. 


References 


[AMM] Airault, Η., McKean, H.P., Moser J.: Rational and elliptic solutions of the 
Korteweg-de Vries and a related many-body problem. Comm. Pure Appl. Math, 
30, 95-148, (1977) 

[BGY] Brent, R. P., Gustavson, F. G., Yun, D. Y. Y.: Fast solution of ‘Toeplitz 
systems of equations and computation of Padé approximants, J. Algorithms, 1, 
259-295, (1980) 

[CA74] Candy, J.: A use of limit cycle oscillations to obtain robust analog-to-digital 
converters. IEEE Trans. Commun., COM-22, 298-305, (1974) 

[CA97] Candy, J.: An overview of basic concepts. In: Delta-Sigma Converters. IEEE 
Press, 1-43, (1997) 

[CCCS} Chudnovsky, D.V., Chudnovsky, G.V., Czarkowski, D., Selesnik, I.: Solving 
the Optimal PWM Problem for Single-Phase Inverters, IEEE Trans. Circuits 
and Systems - I, 49, 465-475, (2002) 

[081] Chudnovsky, D.V., Chudnovsky, G.V.,: Transcendental methods and theta- 
functions. Proc. Symp. Pure Mathematics, AMS, Providence, RI, v. 49, part 2, 
167-232, (1989) 

[Ch2] Chudnovsky, D.V., Chudnovsky, G.V.,: Explicit Continued Fractions and 
Quantum Gravity, Acta Applic. Math, Kluwer, Netherlands, 36, 167-185, (1994) 


60 


[Ch3] Chudnovsky, D.V., Chudnovsky, G.V.,: Laws of composition of Backlund 
transformations and the universal form of completely integrable systems in di- 
mensions two and three. Proc. Natl. Acad. Sci. USA, 80, 1774-1777, (1983) 

[Ch4] Chudnovsky, D.V., Chudnovsky, G.V.,: Pole expansions of nonlinear partial 
differential equations. Nuovo Cimento, 40B, 339-353, (1977) 

[Ch5] Chudnovsky, D.V., Chudnovsky, G.V.,: On expansion of algebraic functions 
in power and Puiseux series, I, I]. J. Complexity, 2, 271-294, (1986); 3, 1-25, 
(1987) 

[CRGE] Craven, P., Gerzon, M.: Lossless coding for audio discs. J. Audio Eng. Soc., 
706-719 (1996) 

[CU] Cutler, C.: Transmission systems employing quantization. US Patent 
2,927,962, (1960) 

[GR87| Gray, R.: Oversampled sigma-delta modulation. IEEE Trans. Commun., 
COM-35, 481-489, (1987) 

[GR89] Gray, R: Spectral analysis of quantization noise in single-loop sigma- delta 
modulation with dc inputs. IEEE Trans. Commun., COM-36, 588-599, (1989) 

[GR97] Gray, R.: Quantization noise in Delta-Sigma A/D converters. In: Delta- 
Sigma Converters, IEEE Press, 44-74, (1997) 

[GRCW] Gray, R., Chou, W., Wong, P.: Quantization noise in single-loop sigma- 
delta modulation with sinusoidal inputs. IEEE Trans. Inform. Theory, IT-35, 
956-968, (1989) 

[GRND] Gray, R., Neuhoff, D.: Quantization. IEEE Trans. Inform. Theory, IT-44, 
2325-2375, (1998) 

[GC] Gunturk, C.: Improved error estimates for first order sigma-delta systems. In: 
International Workshop on Sampling Theory and Applications (Samp TA’99), 
Norway, (1999) 

[TY] Inose, H., Yasuda, Y.: A unit bit coding method by negative feedback. Proc. 
IEEE, 51, 1524-1535, (1963) 

[K1] Knuth, D.: The Art of Computer Programming, v. 2, Addison-Wesly, (1981) 

[Li] Littlewood, D. E.: The Theory of Group Characters, Oxford University Press, 
(1958) 

[M] MacMahon, P.A.: Combinatory Analysis, v.I,II, Cambridge University Press, 
(1915) 

[NST] S. Norsworthy, R. Schreier, G.Temes, (eds.), Delta-Sigma Data Converters. 
Theory, Design and Simulation, IEEE Press, (1997) 

[PH] Patel, H. S., Hoft, R. G.: Generalized technique of harmonic elimination and 
voltage control in thyristor inverters: Part I harmonic elimination. IEEE Trans. 
Ind. Applicat., 310-317, (1973) 

[5] Szego, G.: Orthogonal Polynomials, AMS, Providence, RI, (1978) 

[Z] Falqui, G, Magri, F., Padroni, M., Zubelli, J.P.: An elementary approach to the 
polynomial 7-functions of the KP hierarchy. Theor. Math. Physics, 122, 17-28, 
(2000) 


Use of Padé Approximations in Spline 
Construction 


David V. Chudnovsky! and Gregory V. Chudnovsky? 


‘ IMAS, Polytechnic University, Brooklyn, 6 MetroTech Center, NY 11201 
david@imas.poly.edu 

2 IMAS, Polytechnic University, Brooklyn, 6 MetroTech Center, NY 11201 
gregory@imas.poly.edu 


1 Introduction. 


This paper deals with a very important practical problem of constructing the 
“best” spline approximation tu a curve or a surface. Needless to say, this prob- 
lem has manv practical application, especially in cases of data modeling and 
data fitting. In this paper we are dealing with curves, in one- and multi- di- 
mensional cases, and, in particular, with closed curves. The main methods of 
this paper are methods of (generalized) Padé approximations and orthogonal 
polynomials on the interval and the unit circle, and related methods of me- 
chanical quadrature formulas. For references to the approaches to the spline 
reconstruction of a data curve, where a variety of ’ fitting” figures of merit is 
used, see [KH], [MW], [MP], [W], [M95], [C95], [M97], [MM], [LM94], [Z]. 
The basic approach we use in this paper is that of proper ”spectral” match 
of the spline approximation to the original data curve. This corresponds to 
the assumption of perceived similarity between objects based on matching 
according to a certain band-limited frequency characteristic curve. This as- 
sumption is used to justify the use of band-limited information extracted 
from the original data curve to match the (polynomial) spline approxima- 
tion to such a curve. Even in the simplest (but the most important) case 
of matching the data curve with piecewise-constant approximation the con- 
cept of band-limited approximation leads to interesting generic observations 
about the number of independent degrees of freedom of the original data 
that can be captured in the approximations. The simplest data fit is that of 
fixed locations of Δί knots of the spline (sample points) where the values can 
be arbitrary numbers matching N (real) spectral coefficients of the original 
data. This subject is well covered by Nyquist theorem and related studies of 
band-limited functions sampled, typically, at uniformly distributed N points. 
On the other hand, we have recently studied so-called PWM approximations, 
where the sample points are arbitrary (in a given interval of observations) but 
the values are limited only to 2 - we approximate data with pulse trains of N 


D. Chudnovsky et al. (eds.), Number Theory 


© Springer-Verlag New York, Inc. 2004 
Ὁ Ὁ ) 


62 


arbitrary width but equal height pulses whose spectral characteristics match 
N (real) spectral characteristics of the original data (see, e.g. [Ch99]). In this 
paper we consider the case of variable locations of sample points (spline knots) 
and values of approximating function. This gives the total of 2N degrees of 
freedom for N knots (sample points) - ”twice the Nyquist limit”. 

The proposed method uses an analytic construction of the spline curve 
best approximating the data curve by matching the maximum number of 
Fourier coefficients (trigonometric moments). It is based on technique of gen- 
eralized (two-point) Padé approximations and orthogonal polynomials on the 
unit circle, and is very similar to the PWM technique from [Ch99]. 

We also describe and generalize a somewhat related approach of construct- 
ing splines matching the monomial moments of the approximated function on 
the interval studied in a series of papers starting from [G84]. In the case of 
an approximation of a single function f(t) on intervals [0,co) and [a,b] by 
splines that match the maximum number of polynomial moments on inter- 
vals, the reduction to the classical (Gauss) problem of mechanical quadratures 
was obtained by Gautschi [G84], Gautschi, Milovanovic, and Frontini [F87], 
Gautschi, Milovanovic [G86] (see also Micchelli [M88] and related work in 
[MK88], [M00]). Precise formulations of these results are presented below. An 
important and related problem of periodic spline approximations was left open 
in publications quoted above, and is solved. 

A spline function of degree m > 0, with n distinct knots 71,72,...,7 in 
the interior of [0,1], can be written in terms of truncated powers in the form 


= Pmt Σ ως - ) ,O<t<l, 


where a, are real numbers and p, is a polynomial of degree < m. 
Problem. Determine spline s,,,, such that 


1 1 
[ εἴ Spm (t)dt - | εἰ f (t)dt, 


holds for 7 = 0,1,...,2n—1 and 
s*) (1) = f(1), k=0,1,...,m,. 


Here it is assumed that f has m derivatives at t = 1, all being known. 


Define 
(m+ j+1)! LO ¢ 
i= agp “PO Σ - 1)"Jdt, 
for 7 = 0,1,...,2n -- 1. This gives rise to linear functional £ on polynomials 


of the form t™t' p(t), p € Pan_1 (Pg is the set of polynomials in t of degree 
at most d), defined by its values on monomials: 


63 


Lt" -P) = pj, 7 =0,1,...,2n-1, 
and the inner product 
<p,q>= LI" p-g), pd € Pan-1. 
The orthogonal polynomials πη, is defined by 
deg tn =, πῃ =t"+..., 


« πη: >= 0 forall g € Pn-1. 


Theorem. There exists a unique spline function on [0.1], 
nr 
Snzm = DPm(t) + So a(n —t)?,0<71 <1, WT, forv ζῇ μ, 
v=1 


satisfying moment equations of Problem above, if and only if the orthogo- 
nal polynomials 7, above exist, are unique, and have n distinct real zeroes 
τῆν = 1,2,...,n, all contained in an open interval (0,1). The knots 7, of 
Sn,m are then precisely these zeroes, 


the polynomial p,, is given by 


Pm (t) — »" 
k=0 


and the coefficients a, are obtained uniquely from the linear system 


fH 
dit " 1)", 


Lo(t™* p) = L(t™*'p) forall p € Pr-1, 


where 


£Lo(g) = > ᾿ανσί(σν), τν = 7h”. 
v=1 


More complicated spline approximations where degrees of piecewise ap- 
proximations vary among the knots also can be described by generalized or- 
thogonal polynomials and Turan’s quadrature formulas (see [M00)]). 

We generalize these and other similar results below for the case of ap- 
proximating spatial d-dimensional curves g(t) by spatial splines defined on 
the interval. that. match the maximum (allowed) number of moments on that 
interval. For this we use the technique of Padé approximations of the Second 
Kind and related (multi-index) orthogonal polynomials of the Second Kind. 

The main problem with the moment-matching (moment-preserving) spline 
approximations lies with the instabilities in moment problem calculations in 
cases of moments on infinite and finite intervals. Some of these problems are 


64 


related to ill-conditioning of related Hankel matrices, and some problems are 
related to singularities arising from non-normals cases, and cases when roots 
are not real or lie off the interval that supports the measure. 

The trigonometric moments, and related polynomials with zeroes on the 
unit circle do not have such instabilities; they are more advantageous for con- 
struction of spline approximations. They also provide a unique framework for 
the construction of periodic splines approximating periodic functions (e.g., 
closed planar or spatial curves). Our results solve the problem of approxi- 
mating periodic functions (curves) by periodic splines, left open in previous 
studies. The technique here differs considerable from the traditional approach 
of Gauss’s mechanical quadratures (and their generalizations), and involves 
two-point Padé approximations (at z = 0 and z = oo) and quadratures at the 
unit circle. 

Unlike the corresponding figure of merit based on matching of moments 
on the interval, figure of merits based on matching of trigonometric moments 
always has ”the best” solution (with the minimal number of knots). This solu- 
tion is quite practical, fast, and also matches correctly piecewise-polynomial 
input data. The solution is easily generalized to the case of (closed) multi- 
dimensional curves. 

There is no obvious generalization of this method to the surfaces and 
other multi-dimensional manifolds due to the absence of any useful general 
multi-dimensional cubature formulas and related multi-dimensional orthog- 
onal polynomials. Though there are some specific cubature formulas (like 
Radon’s two-dimensional theorem), they seem to work for very specific num- 
bers of knots, and cannot be used with, say, adding one extra knot at a time. 
Nevertheless, one can use our approach with the curves on multi- dimensional 
manifolds to find the best knots on the ”scan-curves” of the manifolds, con- 
structing this way multi-dimensional spline patches. 


2 Multi-dimensional Spline Curves and Padé 
Approximations. 


2.1 Simultaneous Padé Approximations of the Second Kind. 


To deal with the problem of analytic construction of d-dimensional spline 
curves of arbitrary degree m, we apply the methods of simultaneous Padé 
approximations of the Second Kind, see [Ch85]. Just as the ordinary Padé 
approximants are expressed in terms of (general) orthogonal polynomials, si- 
multaneous Padé approximants of the Second Kind are also expressed in terms 
of simultaneous orthogonal polynomials (or a solution to a multiple moments 
problem). Here are the alternative definitions. 

Definition of One-Point Padé approximations of the Second 
Kind. Let fi(z) ... fa(z) be a system of functions defined by their Taylor 


65 


expansions at z = oo. Let ἢ = (nj,...,ng) be a multi-index with non- 
negative n;, ὁ = 1,...,d. Then the polynomial P,(z) in z of degree (at 
most) [η| = ny + ... + nq is called a Padé approximant of the Second 


Kind to fi(z),..., fa(z) at z = oo of the weight n if there are polynomials 
Qi(z)t=1,...,d in z of degree at most |n| such that the following approxi- 
mation holds. 

The Laurent expansion at z = oo of Q;(z)/P(z) coincides with the Taylor 
expansion of f;(z) up to (and including) the term z~™, for alli = 1,...,d. 

An alternative definition of these approximants involve orthogonal poly- 
nomials of the Second Kind. Measures defining these polynomials can have 
support on unions of arbitrary arcs in the complex plane. However, for a typ- 
ical application that we study here, the support of all d measures is within 
the same interval [a, b], that can be chosen, without any loss of generality, to 
be [0,1]. We consider d measures dy;(t) on [0,1], where often these measures 
are defined by means of weight functions w;(t)dt (for i = 1,...,d). These 
measures produce the simultaneous problem of moments and the class of or- 
thogonal polynomials of the Second Kind on the interval, as in the following 
definition. 

Definition of Orthogonal Polynomials of the Second Kind. Let 
dji;(t) be d measures on the interval [0,1] (i = 1,...,d). Let ἢ = (n1,...,na) 
be a multi-index with non-negative n;,i = 1,...,d. Then the polynomial Py, (t) 
in t of degree (at most) |[n| = τι +...+ gq is called an orthogonal polynomial 
of the Second Kind with respect to measures dyii(t), ... ,dja(t) on [0,1] of 
the weight n if the following orthogonality relations hold. 


] 
[ t? Palt)du,(t) = 0, 7 =0,...,n; 
JQ 


for2 = l,...,d. 

The relationship between the (one-point) Padé approximants and orthog- 
onal polynomials of the Second Kind is fairly straightforward, just as in the 
classical case of d = 1 of the classical moment problem. For this one converts 


the (measure) moments into their generating function, expanded at z = οὐ; 
as follows. 
For ὁ = 1,...,d, we define the generating function f;(z) in terms of the 


moments of the measure dy; (t) on [0, 1]: 


‘ 


k=0 


where 
1 
Mki = | t* ἀμ: (t), k= 0.. νον 
0 


In these notations the problem of constructing simultaneous orthogonal 
polynomials and (one-point) Padé approximants of the Second Kind for the 
multi-index n are equivalent. : 


66 


The construction of the polynomial P, is reduced to the problem of lin- 
ear algebra via a standard Gramm-Schmidt reduction, and an inversion of 
the (generalized) Hankel matrix. There are two important conditions on the 
measures that are often imposed. The first one is the ”normality condition” 
of the Padé approximations. This condition means that the polynomial P, is 
unique (up to a multiplicative constant factor), and has the degree exactly |n| 
for all multi-indices n. This condition is equivalent to the condition of non- 
singularity of all corresponding (generalized) Hankel matrices. The ”normality 
condition” holds generically, and its violation means that there are non-trivial 
relations between the approximating functions. The simplest necessary condi- 
tion for ” normality” is the linear independence of functions f;(z),7 = 1,...,d 
over C(z) (or R(z)). 

The second condition concerns the distribution of zeros of orthogonal poly- 
nomials. Under very mild conditions on the behavior of functions f;(z) (re- 
lated to their singularities), the distribution of zeros of polynomials P,(t) 
approaches the union of supports of measures dy;(t). Stronger conditions en- 
sure that all zeros of polynomials P,(t) are simple and are located in the 
interval [0,1]. For d = 1 one such condition is the condition that the weight 
w(t) does not change sign on the interval [0,1]. For d > 1 the conditions on 
weights w,(t) are stronger, of the form of the Chebyshev system. 

Since the construction of orthogonal polynomials implies the solution of 
linear equations, these polynomials can be expressed in terms of determinants 
of generalized Hankel matrices. Assuming normality, the monic orthogonal 
polynomial P,(t) corresponding to a multi-index n = (nj,...,nqg) of degree 
|n| can be expressed as a ratio of two determinants: 


™0,1 77} 1.1 ... Τ7}}]}.1 
™n,—1,1 Τ7τι..1 +++ Mnj+n,-1,1 
ὃ -- -᾿ ὌΝ 
Fal) 7 Dn ™O,d M1 ἃ +e. ™|n|,d 
™Mng—1,d Mng,d --+ Min|+ng—1,d 
1 ἐ ... εἰμὶ 


where 


67 


70,1 M11 --- Mn|—-1,1 
Mn y—1,1 Mn,,1 +--+ Mn|+n,—2,1 
Da — . . . . , 


Mod Mid --- Mn|—1,d 


Mng—1,d Mng,d - - - Mn|4+ng—2,d 


This representation, if used directly, often leeds to ill-conditioned calcula- 
tions even for moderate n. Faster, and more stable, algorithms are sometimes 
based on recurrence relations expressing P,(t). In the case d = 1 the relation is 
a simple three-term linear recurrence, connecting 3 nearby monic orthogonal 
polynomials: 


P,(t) = (ὁ + bn) - Pr—1(t) — Cn + Pr—2(t), 


n = 2,.... | 

This three-term recurrence also allows to reduce the problem of finding 
zeros of ordinary orthogonal polynomials to the problem of determining eigen- 
values of the three-diagonal matrices (formed from the three-term recurrence 
relation). 

In the case of d > 1, we have linear recurrences of the order d+1 (of length 
d+2) connecting orthogonal polynomials of the Second Kind. The easiest way 
to express these relations is to look only at diagonal or near-diagonal cases of 
simultaneous approximations. Namely for every NV, we create the multi-index 
n(N) =(qt+l,...,¢+1,4,---,¢) (¢+]1 is repeated r times, and q is repeated 
d—r times) for N = q-d +r. Then orthogonal polynomials of the Second 
Kind corresponding to multi-indices n(V) satisfy the recurrence of order d-+1: 


d+1 


Pawn) = ἐ- Payn-1) + δ dk - Pa(n—k)> 
k=1 


N =d+1..... 


2.2 Multi-dimensional Spline Curves and Moment Problems. 


Tn order to find an analytic solution to the problem of construction of multi- 
dimensional spline curves using moment approximation we use the method of 
simultaneous Padé approximations from the previous subsection. 

Here we use as a figure of merit of the approximation of the multi- dimen- 
sional curve g(t) = (g1(t),.--,ga(t)) for t in [0,1] by a spline of degree m a 
coincidence of maximum allowed number of moments in t. Here d is the di- 
mension of the Euclidian space R“, where the approximated curve g(t) lies. In 


68 


order to avoid trivial cases, that will result in non-” normality” of Padé approx- 
imations one should assume that the curve g(t) is essentially d-dimensional, 
e.g. functions gi(t),...,ga(t)) are linearly dependent over R. The minimal 
assumption on g(t) is the integrability of g;(t) on [0, 1]. 

We are looking at a d-dimensional spline of degree m sp(t) defined on [0,1] 
that approximates g(t) in the sense of matching the highest (expected) number 
of moments on [0,1]. Specifically, for a multi-index ἢ = (n,...,na), we are 
looking at the d-dimensional spline of degree m sp(t) = (spi(t),..., spa(t)) 
defined on [0,1] with N = |n| knots t;,7 = 1,...,N in [0,1] (0 < ti < 

. < ty <1), matching the moments with g(t) on [0,1]. This matching 
can be formulated in one of the two following statements (as in the one- 
dimensional case): 

Problem M1. Determine the N-knot spline sp(t) = (spi(t),..., spa(t)) 
of degree m, such that 


1 1 
| tt -spi(t)at = [ t®.g,(t)dt,k = 0,...,N+nj+m, 
0 0 


ἴου ὁ = l,...d. 

The second problem assumes that g(t) has the first m known derivatives 
at ὁ = 1, and we are trying to match these derivatives (at t = 1), in addition 
to the maximal number of moments: 

Problem M2. Determine the N-knot spline sp(t) = (spi(t),..., spa(t)) 
of degree m, such that 


1 1 
| t-sp(t)at = [ ἐν. σι) ἀξ, Καὶ =0,...,N+n;—-1, 
0 0 


and 
sp”? (t) = gs” (t), k = 0,...,m, 
forz = l,...d. 
Any d-dimensional spline of degree m sp(t) = (spi(t),...,spa(t)) defined 
on [0,1] with N = |n| knots t;,7 = 1,...,N can be represented in the 


following form: 


N . 
sp;(t) = pi(t) + δ᾽ aj ({) -- ty, 
j=l 


where p;(t) are polynomials of degree at most m, fori = 1,... ,d. 
Problems M1-M2 of optimal construction of spatial splines sp(t), approx- 
imating original spatial curve g(t) can be reduced to the problem of finding 
zeros of orthogonal polynomials of the Second Kind and related (spatial) me- 
chanical quadrature. The corresponding moment problems and functions that 
are Padé approximated arise from (m + 1)-st differentiation, when g(t) is, 
essentially, approximated by a linear combination of delta-functions 6(t —t;), 


69 


centered at the splines’s knots (zeros of orthogonal polynomials of the Second 
Kind). This reduction is very similar to the one-dimensional case, and involves 
mainly integration by parts. 

We’ll present the d-dimensional moment problems equivalent to Problems 
M1-M2. These moment problems can be actually reduced to d-dimensional 
moment problems with d weights on [0,1], provided that g(t) if (m + 1)- 
differentiable: g;(t) € C™t![0,1],i = 1,...,d. 

First, the case of the Problem M1. The corresponding d sequences of mo- 
ments {mj ,}, k = 0,... are: 


1 _ (m+k-+1)! 


bi mik! 


1 
| t*g;(t) dt, k = 0,..., 
0 
fori = 1,...,d. As usual, these moments extend by linearity to linear func- 
tionals £} defined on the set P of polynomials q(t) in t using the following 
definition of £} on monomials ¢*: 


Li(t*) = Mis Κα = O,..., 


forz = 1,...,d. These linear functionals are extended to the scalar product, 
defined for all polynomials p(t) and q(t) from P: 


<p.q >) = Li((1—t)™** - p(t) - d(t)), 


The orthogonal polynomials of the Second Kind of the multi-index n corre- 
sponding to scalar products <.,. >}, ὁ = 1,...,d have zeros that correspond 
to knots t),...,¢n for N = |n|. To determine the parameters a;;, 7 =1,...,N 
and coefficients of polynomials p;(t) one needs to introduce other linear func- 
tionals, arising from m+ 1-st differentiation of g;(t): 


m N 
Lo i(h) = So beh’ (1) + δ᾽ aj ih(ty), 
k=0 j=l 


where (1) | 
-1}ὃὔ ὦ} 
bn = Pi (1), k = 0,...,m, 
fori = l,...,d. 
Theorem There exists a unique d-dimensional spline approximation sp(t), 
solving the problem M1 for a given multi-index n = (nj,..., πα) iff there exists 


a unique orthogonal polynomial of the Second Kind P,(t) such that 
< Pa,q >; = 0, 


for all polynomials g(t) of degree < n,; for alli = 1,...,d having N = |nl 
real zeros t;, 7 = 1,...,N in the interval (0,1). The parameters α7,1 and 


70 


polynomials p;(t) (defined by their expansion coefficients b,,;) are determined 
from the block- Vandermonde equations 


Οὐ (t™*" -@) = Li (4) 


for all polynomials q(t) of degree < N +n, and alli = 1,...,d. 
The proof follows from integration by parts that gives: 


5 kim! 
| ἐπι, — t)™ dt = ——— te 
0 (m+k+1)! 


1 m 7 
t* ὶ t dt = -------------τττττος---------- b j prtitk Ν 
I Plt) (m+k+1)! » Lt πεπιτὶ t=1; 


and thus: 
δὲ (ὑπ ΕἸ ° t*) = Mig 1» k --- 0, seg 


fori = 1,...,d. 

Provided that g(t) is (m + 1)-differentiable: g;(t) € om" (0, 1]i = 
1,...,d, one can replace the general moment problem for My j by a prob- 
lem, where the measure dy;(t) is a weight: 

1 m 

aust) = emia — get CP gmt at, 

for i = 1,...,d. This means that the inner product < .,. >} can be repre- 
sented as: 


< p,q t= { p(t) - q(t) dus(t), 


i= 1,...,d. 

Then the unique solution to the d-dimensional spline problem M1 exists for 
a given multi-index n if and only if the corresponding orthogonal polynomial 
of the Second Kind P,(t) is unique of degree N = |n| (this is true under the 
normality assumption), and its N roots are all real and lie in the interval 
(0,1). Assumptions of normality of the function system g(t) are necessary for 
this, and in the case of d = 1 the positivity of ge tit) on [0, 1] is a sufficient 
condition. It is satisfied when, for example, g;(t) is a monotonic function on 
[O, 1]. 

The solution to the problem M2 is very similar, but one introduces slightly 
different moments and related linear functionals. Also the definition of the 
polynomials p;(t), denoted as p?(t) is very simple in terms of values of deriva- 
tives of g;(t) at t = 1: 


m (j) 
274) -- j9i_\t) (1) 
p; (t) 2. (t — 1) ji 


forz = 1,...,d. The moments {mj ,} in the Problem M2 are: 


71 


(m+k+1)! (" a yee 
mes = ME ff oto — e- 15} yat, & = 0, 


j=0 
and these moments extend by linearity to linear functionals ΟΣ defined on the 
set P of polynomials q(t) in ¢ using the following definition of £? on monomials 
t*: 
£?(t*) = Mie is k -Ξ O,..., 
fori = 1,...,d. These linear functionals are extended to the scalar product, 
defined for all polynomials p(t) and q(t) from P: 


< p,q >7 = L3 (p(t) - a(t)). 


2 = 1,...,d. To determine the parameters a;;, 7 = 1,...,N one needs to 
introduce other linear functionals, arising from m + 1-st differentiation of 


gi(t): 


N 
Lo i(h) = δ" ajsh(t;) 


j=l 

1=1,...,d. 
Theorem There exists a unique d-dimensional spline approximation sp(t), 
solving the problem M2 for a given multi-index ἢ = (nj,..., nq) iff there exists 


a unique orthogonal polynomial of the Second Kind P,(t) such that 
< Py,q>?= 0, 


for all polynomials q(t) of degree < πὸ; for alli = 1,...,d having N = |n| 
real zeros t;,7 = 1,...,N in the interval (0,1). The parameters a;4 are 
determined from the block- Vandermonde equations 


Lo i(t™** +g) = L;(q) 


for all polynomials q(t) of degree < Ν - πὶ and alli = 1,...,d. 

Provided that g(t) is (m + 1)-differentiable: g;(t) € C™t'[0,1],7 = 
1,...,d, one can replace the general moment problem for me ; by a prob- 
lem, where the measure dy;(t), denoted as du?(t) is a weight: 


ἀμξ() = t™*?. -- — ae gi" (t) αἱ, 
fori = 1,...,d. 

The main problem with construction of moment-preserving spline approx- 
imations on the interval (infinite or finite) to generic functions is its inher- 
ent instability. Corresponding linear problem are notoriously ill-conditioned 
are require nearly quadratic increase with N in precision of computations, 
making it prohibitively difficult even for moderate N. These difficulties are 
significantly less pronounced in the case of periodic splines, or for other simi- 
lar problem with functions defined on the (part of) the unit circle, when the 
knots (zeroes) are lying on the unit circle. 


72 


3 Trigonometric Moments Problem and Periodic Splines. 


3.1 Carathéodory Theorem. 


A crucial result used in the solution of the trigonometric moment problem is 
known as the Carathéodory theorem: 


Theorem (Carathéodory). For any N complex numbers cj, ... ΟΝ 
there exists a unique M < N, positive real numbers pj, ... , pag and distinct 
real numbers 6), ... ,@a7 such that --π < 6; <7,j = 1,...,M and 

M 
ch = δ p ei k = 1, ,N 
j=l 


The proof of this Carathéodory theorem is important because the same 
method is used in some of our proofs as well. We use the classical method de- 
scribed in [GS]. First, one extends the sequence {cx },=1,... no negative indices 
as follows: c_, = cz for k = 1,..., where c* is the complex conjugate of c. 
Then we construct a (N+1) x (N+1) ”zero-diagonal” Toeplitz matrix T such 
that T;,; = cj_; fori Aj andiz,j = 0,...,N and Το; = Oforz = 0,...,N. 
Let us denote by Ap the smallest eigenvalue of T, and define (for the first time 
in the context of the Carathéodory theorem) cg as co = --λο. Then one can 
defined a” full” (N +1) x (N +1) Toeplitz matrix Ty with (Tw)ij; = cj: 
for i,j = 0,...,N. 

We can define M < N such that N—M is the multiplicity of the eigenvalue 
Ag. Then M is the rank of Ty. One looks then at the leading (M+1) x (M+1) 
sub-matrix Ty of Ty with (Tw)i,; = ο᾽--ἰ fori,j = 0,...,M. Then Ty, as 
well as Ty has a zero eigenvalue. Let e be the zero eigenvalue of T yy, where 
e = (€0,...,e€m). 

From e one can construct the polynomial Q.(z), defined via elements of 
6, Q.(z) = ean e;z’. The polynomial Qe(z) of degree M has only simple 
complex roots, and all these roots lie on the unit circle. We denote them as 
e's for 7 = 1,...,M. The weights p; (j = 1,...,M) are defined from the 
Vandermonde system of equations: 


M 
Ck = ) ρῃ εἶ 91 Κ -- 1]1,...,N. 
j=l 


The weights p; are positive, and jet Qj = Co. This proves Caratheodory 
theorem. 


3.2 Approximations of Periodic Function by Periodic Splines. 


We start with the one-dimensional case of a real periodic function f(t), defined 
on [0, 27] (periodic with the period 27) and having well- defined leading N +1 


73 


(complex) Fourier coefficients co, ... , cn. Because f(t) is real, one extends cx 
to negative k by defining: c_, = cj for positive k. 

Since we consider a problem of approximating a periodic function by peri- 
odic splines, we can use instead of f(t) its truncated expansion in the Fourier 


series: 
fn (t) -> che 
where 
C= — e . 
k 27 


The periodic spline sp(t) on [0,27] can be defined in two different ways, 
depending on whether the end points of the period are included in the list 
of knots. In the simpler definition, the periodic spline sp(t) of degree m on 
[0,27] has M knots, starting with tg = 0. These are: 


O=to «... «ἐμ. < 27, 


and we can add by periodicity ἐμ = 27. The spline has the generic represen- 


tation: 
sp(t) = Pm(t) + Σ ai(ti -- t)7, 


with additional periodicity conditions satisfied: 
sp\*) (27) = sp(0), k = 0,...,m—1. 


This leaves a total of 2Μ — 1 real variables in sp(t) (M — 1 for variable 
knots t;,7 = 1,...,M-—1 and total M for free coefficients: a; at (t—t;)", i= 
1,...,W—-—1 and for a constant term of p,,(t)). This allows us to match a total 
of 2M — 1 real Fourier coefficients of sp(t) with Fourier coefficients of faz—1(t) 
(or f(t)), orc; for 7 = 0,..., M—1. The periodicity conditions uniquely define 
all but the constant coefficients of p,,(t) in terms of tj,a;,7 = 1,...,M—1. 

Let us write pm(t) as oy πηι, κ᾿ ἐδ, so that the leading coefficient of p, (t) 
(at t™) is Pm,m. Then the periodicity conditions uniquely determine all non- 
constant coefficients pm, fork = 1,...,min terms oft;,a;,7=1,...,M—1, 
from differences of derivatives at t = 27 and t = 0: 


1 d* 1 d* m— κ᾽ 
αἰ πιεΡηι(π) = αἰ qe Pm (9) + (- 1)*( D> axl; k = 0,. — 1. 


The constant coefficient pmo of pm(t) is not determined from the periodicity 
condition, and is determined by equating the constant Fourier coefficients of 
sp(t) and fr—i(t) (ie., co). 

After m-th order differentiation we can reduce the problem of fitting m-th 
degree spline to the problem of fitting piecewise-constant function (spline of 


74 


degree 0) with M jump points to the Fourier expansion of for (ὃ. In the 
notations above, (t; — t)4 is a unit step-function with a jump at ἐ = ἐ;. 
Using integration in parts we get for any 27-periodic function h(t), 


2π 2π 
| h'(t)e* dt = —(in) | h(t)ei dt. 
0 0 


This means that in order to determine m-th degree spline sp(t) of the form 
above (or more specifically, its free parameters: ἐ;, ὁ = 1,..., M—1, coefficients 
a, at (t;-t)7 for? = 1,...,M—1 and aconstant term pmo), we need to solve 
the problem of matching the M — 1 Fourier coefficients οὐ, Καὶ = 1,...,.M-—1 


of fq? (t): 


M-1 
W(t) = So (ik) ™ene™, 
k=~—(M-—i) 


Ch = (ik)™ + cr 


with M — 1 Fourier coefficients of the piecewise-constant function s(t) on 
[0,27] with M jumps at t;. Assuming, as above, that tg = 0 and ty = 2π, 
we define s(t) on [0, 277) as follows: 


s(t) = 6; fort € [ti,te+1), a7=0,...,M—1, 


and extend s(t) everywhere on the real axis by 27-periodicity. The relationship 
between 3; in the definition of s(¢) and coefficients a; (t= 1,...,M — 1) and 
the leading coefficient Dm m (of pm(t)) in the definition of sp(t) is the following 


one: 
M-1 


βι = M\(Pmym + (-1)™ δ ἢ ag) fori = 0,...,M—-1. 
k=i+1 | 
We also define (by periodicity) Bray = Bo. 
It is straightforward to compute the Fourier coefficients of the (peri- 
odic) piecewise-constant function s(t) on [0,27]. For this let us denote z; = 
e 43, 7 = 0,...,M, so that zo = ΖΜ = 1 as assumed above. Then 


| M-1 
5. [ 5()ε ™ dt = ς-- Bilze — 2741), 
| i=0 


for n # 0. Changing the summation order, remembering that zo = ΖΜ: Bo = 
Gu, and adding a new notation G_; = By_1, we get: 


| M-1 
1 2π 


In terms of a;, the definition of v; is straightforward: 


νι = (-1)"t! mia; fori = 1,... 


M-1 
= (-1)"m! δ᾽ Ak. 
k=1 


,M—1, 


This leads to the complete system of equations on t;,a;, 7 = 1,... 


in terms of the Fourier coefficients c, of f(t): 


_> V5 ° ze, Ἢ n=l1,. 


j=0 
M-1 
0 


Qn -(in)™t! -ς 


δ᾽ V; = 0 
j= 

1 M-—1 
m+1 > 


—] 


77 
1 
27 -Co = ate + »" Rae (2π)" 1 
k=0 


These equations are supplemented by: 
ν; = (-1)"*t' ma; fori = 1,... 


M— 


a 


1, d* 
εἰ age Pm (27) -- 


bt 


i= 


3.3 Solution of Periodic Spline Problem. 


ae ne 


75 


The system of equations above provides the solution to the problem of spline 
approximation using matching of trigonometric moments. As the equations 
show, the problem is entirely reduced to the solution of the following main 


trigonometric problem: 


M-1 
— nr — 
Cr= 5 Μη. 27. ἢ = 1, ,M --Ἰ, 
j=0 
Μ--1 


Ψ) ΞΞ 
j=0 


Here C, are properly modified Fourier coefficients: 
Cy = 2n-(in)™*'-e, n £0, 


and z; are lying on the unit circle: z; = ej, 7 = 0,..., 
Ζ0 = 1. 


M— 


1, so that 


76 


There are two slightly different solutions to this problem. One is based on 
the standard approach using Padé approximation; this time it is a two- point 
Padé approximation at z = 0 and z = oo. While straightforward, it leaves open 
the usual questions of normality and rigorous proof of the location of zeroes 
of Padé approximants on the unit circle. An alternative approach is based 
on the Carathéodory theorem, and provides the necessary simple criteria of 
normality and the proof of zeroes’ location on the unit circle. We present both 
solutions that compliment each other. 

To arrive at the Padé approximation representation, we introduce the poly- 
nomial Q(z) of degree M: 


M— 
Q(z) I Z— 25), 
and a rational function R(z) = ie τ᾿ >? which can be written as a ratio 
of two polynomials: 
P 
R(z) = (2) 
Q(z) 


Because of the equations on v;, deg,(P) < M — 2 (or R(z) > O(z7?) as 
z — oo). Expanding R(z) at z = 0 and z = ov, and using the fact that 
2) lie on the unit circle (27 = 27) and ν; are real, we can see that R(z) 
approximates two series expansions associated with Fourier coefficients C'x,, 


j=1,...,M—1. The two expansions are: 


M-1 M-1 


= δ᾽ Cre * 1, ΕΓ) = - C_pz*! 
k=1 


k=1 


at z = oo and z = 0, respectively. Everywhere above and below we set Cp = 0 
and C'__, = Οἔ for positive k. 
The expansions of R(z) at z = 0 and z = oo are 


R(z) = Soe Dost at z — οὐ, 


k=0 
oe) Μ--Ἱ 
-δ᾿ zk. ym τ at z -- 0. 
k=0 j=0 ~5 
Thus the approximations properties of R(z), equivalent to the original equa- 


tions, are the following: 
R(z) = ft(z) + O(2°-™7!) at σ -- οὐ, 


R(z) = [ (ἡ + O(2""") at z 3 0. 


These approximation properties can be replaced by 


77 


P(z) = Q(z): ft(z) + O(z7!) at z—- «=, 
P(z) = Q(z)- f7(z) + O(2"7") at z -- 0. 


The two approximations above can be written as a system of linear equations 
on coefficients of the polynomial Q(z). If Q(z) is expanded in powers of z as: 


M 
i=0 
then we get the system of linear equations on Ὁ}, 7 =0,..., M: 
M . 
ΝΟ; Οὐκ — 0, k= 0, se ,M — 2. 
j=0 


In the equations above we set (Ὁ = 0 and C_, = Cj for positive k. 
The system of equations on @; above (M — 1 of them) is supplemented by 


the following one: 
 M 
> @; = 0, 
j=0 


equivalent to the statement that Q(1) = Ὁ or z = 1. In the case of nor- 
mality of two-point Padé approximation, the system of M equations on ᾧ; 
uniquely determines the polynomial Q(z) up to the (normalization of) the 
leading coefficient @az; so one can add M + 1-st equation 


Qu = 1. 


In the generic case the two-point Padé approximations are normal and 
polynomials Q(z) have all roots distinct and lying on the unit circle. How- 
ever, this approach based on Padé approximations alone (or solution of the 
system of Toeplitz equations) does not always guarantee the solution of the 
problem of finding the periodic spline with proper matching trigonometric 
moments. Here the situation is similar to that of splines matching moments 
on the interval, where sometimes the orthogonal polynomials have zeroes out- 
side of the interval that supports the measure. The problem of matching the 
trigonometric moments is different - it always has solution. Moreover, this 
solution (as a function of coefficients c;,) is stable. 

The explicit solution to the problem of finding the periodic spline with 
proper matching trigonometric moments can be derived from the Carathéodory 
theorem (or, more precisely, from the method used in the proof of the 
Carathéodory theorem). In this approach we start with the system of equa- 
tions, as in the proof of Carathéodory theorem: 


M-1 

1 _ n ΕΝ 

σ᾿, = ) Vji-z;,n=0,...,M—1, 
j=l 


78 


Here C!, n = 1,...,M —1 are variables, while Cj is dependent on Ci, πὶ = 
1,...,M —1 and is defined from the eigenvalue equation: 


[τ΄ + Ch In| = 0, 


for 
(T*)i5 = ΟἹ 


7--Ξὸ 1,9 -- 0,...,M—1, ὦ ζῇ 7, 
(T");; = 0. 
(More precisely, Cé is the largest real number satisfying this eigenvalue equa- 
tion. ) 

We add z = 1 (for to = 0) to the list of roots {z,};2]' on the unit 
circle with a variable vo as its weight. ‘Thus we set Cl + nug = (ὦ, for all 
n= 1,...,M —1. In these notations the system of nonlinear equations 


M-1 
—_ an ΝΕ 
Ο, = ) vi3-27,r=1,...,M—-1, 
= 


M-1 

dj = 0 

j=0 

can be represented, according to the (proof of) Carathéodory theorem as a 
determinantal identity: 


-Μῃ Cy -- νοὸ .-.. Ομ." -- Vo 
C_1—W —lg... Cy_2—- Μὴ 
C_m+i — iW C_m42—VYo... —V0 


This equation can be written in the form 
|T - vo: Ὅμ! ΞΞ QO, 


where, as above, (T);,; = Cj-i, 1,12 = 0,...,M-—1, fori ζ 7 and (T);; = 0, 
and Uy is ”all ones” matrix: (Um);,; = 1, 1,7 = 0,..., M—1. Consequently, 
we get a simple linear equation, defining vp: 


IT] — vo -|T’| = 0, 


where T” is an algebraic compliment of (elements of) T. 

Once vp is determined from this equation, all other v; and z; = εἷθ7 for 
j =1,...,M—1 are determined from the Carathéodory theorem (as above) 
with C; = C; — vw forj = 1,...,M-1. 

This method is stable and also has the advantage that it determines the 
minimal number Μ of knots t; with ἐρ = 0 of the periodic spline sp(t) of 
degree m that matches N Fourier coefficients c; of f(t) (for M < N). 


79 


There are, however, some cases when |T’| = Ὁ and in those cases one 
cannot get any periodic spline sp(t) of degree m that matches M Fourier 
coefficients c; of f(t) and has M knots on [0,27] with ἐρ = 0. It turns out 
that there is still a stable solution in this case as well; it just requires M knots 
to be in a general position in [0, 27]. 

Here we will use a second definition of a periodic spline sp2(t) on [0, 27] 
that does not fix a knot at ἐρ = 0. In the second, a more complex, definition, 
the periodic spline sp2(t) of degree m on [0,27] has M knots: 


0<to <... «ἐμ-: < 2r, 


and we can add by periodicity ἐμ = to. The spline has the form: 


M-1 


spo(t) = pm(t) + »" ai(t; -- Ε) 1, 
i=0 


satisfying very strong periodicity conditions (of C'’-smoothness): 
sp) (21) = sp (0), k=0,...,m. 


This leaves total of 2M real variables in sp2(t): M for variable knots t;, i = 
0,...,M—-—1, M —1 for coefficients: a; at (t-—t;)™, ὁ =0,..., M —1 and for 
a constant term of p,,(t). The single relation between a; is the following one: 


ag+... tay_) = 0. 


This allows us to match 2M — 1 real Fourier coefficients of sp(t) with Fourier 
coefficients of faz_—1(t) (or f(t)), or c; for 7 = 0,...,M — 1, and still leaves 
one extra degree of freedom. 

This extra degree of freedom is very easy to observe. Since f(t) is 27 - 
periodic, so is f(t—to) for any (real) to, and Fourier coefficients of f(t—to) are 
related to Fourier coefficients of f(t) by multiplicative factors of e’”’°. Then 
we apply essentially the ”*simple periodic splines” methods, described above, 
to f(t—to) but now with M knots t; <— t; — to, i = 0,..., MW —1 (so one of 
the knots is a fixed one at t = 0). As long as the corresponding T’ = T’(to) 
is non-singular (and that is the generic case), one has an extra parameter 
to to fit. It can be used, for example, to get a better L2-approximation by 
a periodic spline spo(t) of degree m with M knots of f(t) that matches M 
Fourier coefficients of f(t). It also can be used to match one extra real or 
imaginary part of the Fourier coefficient ἐμ. 

There are a variety of algorithms that can be used for the computations 
of the periodic spline approximations sp(t) and sp2(t). For a fixed degree m 
the basic algorithm is reduced to operations with Toeplitz matrices (or the 
construction of two-point Padé approximations). The standard (slow) method 
of computation are of complexity O(M7). These methods are either based on 
direct solution of Toeplitz equations or the use of the recurrence relations con- 
necting Padé approximants Q(z) for increasing M. The recurrence relations 


80 


follow from the classical three-term recurrence relations connecting polyno- 
mials orthogonal on the unit circle, see [5]. Those polynomials have the form: 


Co C_1 ...C_n 
Cy Co ...Coen41 
Ch—-1 Cn—2...C-] 


1 z 2. Ζῆν" 


While all zeroes of p,(z) lie within the unit circle |z| < 1, one constructs the 
” para-orthogonal” polynomials [J89] that have their zeroes on the unit circle 
jz| = 1: 

Pn(z) + w- βη(2) for |w| = 1, 


where pn(z) = 2"-(pn)*(1/z). Two-point Padé approximants Q(z) and poly- 
nomials Qe(z) from the proof of Carathéodory theorem are examples of para- 
orthogonal polynomials, see [789]. 

The recurrence relations arising from polynomials orthogonal on the unit 
circle provide a recursive approach to the construction of optimal periodic 
splines sp(t) and sp2(t). In this approach one can increase the number M 
of knots in a way such that previous knots’s locations interleave the new 
ones. The recursive approach based on Padé approximants also provides a 
fast method of computation of splines sp(t) for a fixed f(t) and m but in- 
creasing M. This recursive approach based on a conventional ”divide and 
conquer” polynomial interpolation with fast polynomial multiplication and 
division results in O (M log*(M)) complexity. 


3.4 A Very Simple Example. 
We present a very simple example of a single harmonics case: 
f(t) = cos(t), 


or ακ = Ὁ unless k = +1, where cy, = ΡΝ In this case one can very easily 
determine analytically the knots ἐ; and weights a;, and the quality of the 
spline approximation sp(t), as described in the previous section, to f(t) in 
various norms. There are 3 cases of different parity of m and M. 

The first, typical, case is of even m and odd M. We put M = 2n—1, and 
the set of M knots t;, 7 = 0,...,M —1 which is: 


π | 
(0}υ on . 7 τ 1,0 τ} {23 .) =n41,...,2n-} 
n 
(with t9 = 0). The corresponding v; are: 


ν; = (-1)™/441. -sin(t;) for 7 = 1,...,M 


δ] 


(m is even). 

The second case is that of odd m and even M. We put M = 2π -- 2 for 
even n. (The same solution also covers the case M = 2n—1 for even n.) Here 
the set of M knots t;, 7 = 0,...,M-—1: 


{0,7} U {- : = 1,...,2n—1 and 7 not divisible by sh 
(5 is an integer). The corresponding v; are: 
vz = (τὴ). -cos(t;) for 7 = 0,...,M 


(m is odd). 
The third (and last) case is that of odd m and odd M_ (mod 4) = 1. We 
put M = 2n-—1 for odd n. Here the set of M knots t;,7 = 0,...,M —1: 


{O,m}U {Ξ :} = 1,...,2n—1 for odd j andj # n, 3n}, 
n 
The corresponding v; are: 
ν; =  (-᾿οἡ τη). τ -cos(t;) ἴοσ 2 = 1,...,M 
n 


(m is odd) and μὴ = 0 corresponding to tg = 0. 
In all these cases, ἐ; and v; uniquely determine the spline approximation 
sp(t) (from formulas presented above for a;, 2 = 1,...,M —1 and pn ({t)). 
The standard measure of the average quality approximation of sp(t) to 
f(t) for an arbitrary f(t) is based on L2-norm: 


(5 / " (CF() — sp(t))? “ν᾽ | 


This quantity can be determined analytically using the Parceval identity and 
the Fourier expansion of f(t) — sp(t) that is easily computable in terms of 
trigonometric sums: 


ΜῈ --Ἰ 
δ vje 5, k = M,... 
7530 


and Fourier coefficients of f(t). In turn, trigonometric sums are computed 
using P(z), Q(z) polynomials from two-point Padé approximation to f (4) (21. 
We define 


ypl2m4+1) (4 ΝΕ +) 4 yl2mt1)(y 4 x) 


Glm,n) = (2m + 1)! (2n)2m42 | 


for a polygamma function w(z). This can be simplified to: 


ΕΠ ΠἜΠΙ] 


men 


of cos(t), sp(t) and cos(t) — sp(¢). 


STATI 
uaa 


of cos(t), sp(t) and cos(t) — sp(t). 


ΒΠΠΠΙΪΤ 
TNE 


of cos(t), sp(t) and cos(t) — sp(t). 


oF ‘AMM 
ay ui 


of cos(t), sp(t) and cos(t¢) — sp(t). 


83 


2m+1 1 
— =] ae, A . eee fF Ba Pe ee RE Bee eee gy 
G(m, n) TT am+i (cot(n)) "5: (2m ae 1)!(2n)2m+2 


When f(t) = cos(t) in all three cases above we get the following expression 
for € as a function of m and n: 


as 5G, n). 


As above, in the case 1 we have m even and M = 2n-—1; in the case 2 we 
have m odd and M = 2n — 2 for even n; in the case 3 we have m odd and 
M = 2n-1 for odd n. 

Looking at logy €, we can see that logge < —24 form = 3,n > 
4n>15m=5,n>9;3m=6,n>6;m=7n>5:m>8,n>4 
10,n > 3;m > 14;n > 2. 

We include several plots of f(t) = cos(t) and of the corresponding spline 
approximation sp(t) in Figs. 1 — 4. Fig. 5 shows the plot of the figure of merit 
logs € as a function of m,n. 


99: ΤΠ = 
;m = 


a 


πως 


i 
; 
᾿ 
a 


100 


Fig. 5. log, e(m, 7) as a function of m,n. 


3.5 Multidimensional Closed Curves and Periodic Splines. 


The case of d-dimensional closed curves f(t) = (fi(t),..., fa(t)) approxi- 
mated by a periodic spline curve sp(t) = (sp (t),..., spg(t)) is similar to the 


84 


case of d-dimensional curves approximated by d-dimensional splines on the 
interval that matches the maximal number of moments. However, here we are 
using the two-point generalized Padé approximants (at z = 0 and z = oo) of 
the Second Kind. This problem also leads to an interesting generalization of 
the Carathéodory theorem where one expresses d sequences {c%} of Fourier 
coefficients by trigonometric exponentials 


M 
S νῷ. es fora = 1,...,d 
j=l 


with M harmonics εἶ), 7 = 1,...,M common to all d sequences (but differ- 
ent weights v3). Just as Carathéodory theorem is used for a spectral estima- 
tion of a time series, the multidimensional problem is used for a joint spectral 
estimation of multiple time series (e.g., multi-channel data collected from the 
same source at the same time). 

In this problem, as above, we look at the d-dimensional curves f(t) = 
(fi(t),..-, fa(t)) that is 27-periodic and is defined by its Fourier coefficients 
cy with the usual reality condition c*, = cf fora = 1,...,d. We are trying 
to find a d-dimensional spline of degree m: sp(t) = (spi(t),...,spa(t)), de- 
fined on [0,27] and 27-periodic that has M knots on [0,27] (common to all 
individual spline functions spa(t)): 


O<to <... «ἐμ. < 27, 


and matching the maximum number of Fourier coefficients c® of fa(t), a = 
1,...,d. The form of individual spline functions is the same as above: 


| Μ--1 
spo(t) = Pam(t) + >> daa(te -- t)%. 
| i=0 
The problem is reduced to the generalized two-point Padé approximation 
problem of the Second Kind (i.e., approximants with the common denominator 
Q(z)) for functions 


M-1 M-1 
fi(z) = Cee), fa(z) = -- δ΄ C% 2", 
k=1 k=1 


at z = oo and z = 0, respectively. 
The solution is reduced to linear algebra problems and to the following set 
of Toeplitz-like equations on the Fourier coefficients ΟἿ: 


M 

δ gi: Ch, = 0,1 = 0,...,N° 

~=0 | 

fora = 1,...,d and σά (Νὰ +1) > M. The knots t; are determined from 


roots z = z; of the associated polynomial Q(z) = ear 4: " Ζ' that lie on the 
unit circle by identification z; = ε΄ 9, 7 = 1,...,M. | 


85 


References 


[Ch85] Chudnovsky, D.V., and Chudnovsky, G.V.: Applications of Padé approxima- 
tions to diophantine inequalities in values of G-functions. Lecture Notes Math., 
Springer-Verlag, v. 1135, 9-51 (1985) 

[Ch99] Chudnovsky, D.V., and Chudnovsky, G.V.: Solution of the pulse width mod- 
ulation problem using orthogonal polynomials and Korteweg-de Vries equations. 
Proc. Nat’l Acad. Sci. USA, 98, no. 22, 12263-12268 (1999) 

[C95] Cohen, F., Huang, Z., Yang, Z.: Invariant matching and identification of 
curves using B-splines curve representation. IEEE Trans. Image Processing. 
4(1), (1995). | 

[F87| Frontini, M., Gautschi, W., Milovanovic, G.M.,: Moment preserving spline 
approximation on finite intervals. Numer. Math., 50, 503-518 (1987) 

[G84] Gautschi, W.,: Discrete approximation to spherically symmetric distributions. 
Numer. Math., 44, 53-60 (1984) | 

[G86] Gautschi, W., Milovanovic, G.M.,: Spline approximations to spherically sym- 
metric distributions. Numer. Math., 49, 111-121 (1986) 

[GS] Grenander, U., Szego, G.: Toeplitz Forms and their Applications. U. of Cali- 
fornia Press, Berkeley, (1958) 

[9089] Jones, W.B., Njasstad, O., Thorn, W.J.,: Moment theory, orthogonal, polyno- 
mials, quadrature, and continued fractions associated with the unit circle. Bull. 
London Math. Soc., 21, 113-152 (1989) 

[KH] Kreylos, O., Hamann, B.,: On Simulated Annealing and the Construction of 
Linear Spline Approximations for Scattered Data. IEEE Trans. on Visualization 
and Computer Graphics, v. 7, no. 1, 17 (2001) 

[LM94] Lu, F., Milios, E.E.,: Optimal spline fitting to planar shape, Signal Process- 
ing. 37, 129-140 (1994) 

[M88] Micchelli, C.A. : Monosplines and moment preserving spline approximation. 
In: Brass, H., Hammerlin, G. (eds) Numerical Integration III, Birkhauser, Basel, 
130-139 (1988) 

[MK88] Milovanovic, G.V., Kovacevic, M.A.,: Moment-preserving spline approxi- 
mation and Turan quadratures. In: Agarwal, R.P., Chow, Y.M., Wilson, S.J. 
(eds.), Numerical Mathematics Singapore, 1988, ISNM v. 86, Birkhauser, Basel, 
357-365 (1988) 

[MOO] Milovanovic, G.M.,: Quadratures with multiple nodes, power orthogonality, 
and moment-preserving spline approximation. In: Gautschi, W., Marcellan, F-., 
Reichel, L. (eds.) Numerical Analysis 2000, Vol. V, Quadrature and orthogonal 
polynomials. J. Comput. Appl. Math. 127, 267-286 (2001) 

[M95] Milanfar, P., Verghese, G., Karl, W., Willsky, A.: Reconstruction of poly- 
gons from moments with connection to array processing. IEEE Trans. Signal 
Processing, 43(2), (1995) 

[M97] Meier, F.W., Shuster, G.M., Katsaggelos, A.K.: An efficient boundary encod- 
ing scheme using B-spline curves which is optimal in the rate distortion sense. 
In: Proc. of VCIP, (1997) ᾿ 

[MM] Mokhtarian, F., Mackworth, A.K.,: A theory of multiscale, curvature-based 
shape representation for planar curves. IEEE Trans. PAMI, 14(8), 789-805 

[MP] Marciniak, K. ,Putz, B.: Approximation of spirals by piecewise curves of fewest 
circular arc segments. Computer Aided Design, 16(2), 87-90 (1984) 


86 


[MW] Meek, D. S., Walton, D. J.: Approximating smooth planar curves by arc 
splines. Journal of Computational and Applied Mathematics, 59, 221-231, 
(1995) 

[N] Nyquist, H.: Certain topics in telegraph transmission theory. AIEE Trans., 617- 
644 (1928) 

[5] Szego, G.: Orthogonal Polynomials. AMS, Providence, RI, (1978) 

[W] Wallner, J.: Generalized multiresolution analysis for arc splines. In: Dahlen, 
M., Lyche, T., Schumaker, L. L. (eds.) Mathematical Methods for Curves and 
Surfaces II, Vanderbilt University Press, 537-544 

[2] Zaletelj, J., Pecci, R., Spaan, F., Hanjalic, A., Lagendijk, R.I.,: Rate Distor- 
tion Optimal Contour Compression Using Cubic B-Splines, European Signal 
Processing Conference (EUSIPCO ’98), Rhodos, GR, (1998) 


© 


ΒῚ (| y, 1. NITC | ae ae ( a 4 c \ ἌΤΑΣ 2 2,7» 
D. Chudnovsky et al. (eds.), Number 
) \ , 


Interactions between number theory and 
operator algebras in the study of the Riemann 
zeta function (d’aprés Bost—Connes and 
Connes) 


PAULA B. COHEN 


August 8, 2000 


Abstract 


This paper arose as a written version of a lecture given at the City University of 
New York Number Theory Seminar in April 1997. The lecture was an overview of ideas 
of Bost-Connes and of Connes on possible ways to approach the study of the Riemann 
zeta function using ideas inspired by noncommutative geometry. The work of Connes is 
directly aimed at a solution of the Generalised Riemann Hypothesis and has undergone 
substantial improvement in the intervening time. In this article we give an overview of 
the original work and these later developments. 


1 Introduction 


In 1859 Riemann published an important foundational paper on the Riemann zeta function. 
Recall that this function is given for Re(s) > 1 by 

co 

(8) =o 

n=1 
and that it has a continuation to all the complex plane which is analytic except for a simple 
pole at s = 1. It is straightforward to show that the Riemann zeta function has zeros at the 
negative even integers and these are called the trivial zeros of the Riemann zeta function. 
The Riemann Hypothesis predicts that the remaining zeros lie on the line Re(s) = 1/2. One 
knows that the non-trivial zeros of ¢(s) lie in the band Re(s) €]0,1[. The generalisation of 
this function for a number field is known as the Dedekind zeta function. It encodes much 
important arithmetical information about the field. One of the major motivations of number 
theory is to understand more fully the Dedekind zeta function, the most famous challenge 
being to understand the locus of the zeros of this function and in particular to settle the 
validity of the Generalised Riemann Hypothesis for these functions. Much powerful work 
has been done in analytic number theory in the attempt to solve the Generalised Riemann 
Hypothesis directly. 

As the study of the zeros of the Riemann zeta function and its generalisations is so 

difficult, one may ask how it is possible to recast the problem. For example, Polya and 


87 


rad i 


heory 


Cnring ἀρ ἃ Tar. wo Ni axxe VA =|, ure if lV / 
Springer-Verlag New York, Inc. 2004 


88 


Hilbert proposed that if one can construct a Hilbert space H and an operator D in H whose 
spectrum comprises the zeros of the Riemann zeta function in the band Re(s) €]0, 1[, then 
possibly one can settle whether or not /—1(D — 1/2) is self-adjoint or whether D(1— D) is 
positive, which would imply the Riemann hypothesis. The point here is that the properties 
of self-adjointness or of positivity are hopefully easier to check. It is important that the 
construction of the Hilbert space and the operator should not depend a priori on the zeta 
function, to avoid tautologies. An as yet non-rigorous approach to the Riemann hypothesis 
initiated by Connes in [5] and further developed in [6],[7], includes an interesting and rigorous 
interpretation of the zeros of the L-functions with Grossencharacter of a global field in terms 
of the action of the idele class group on the coset space of the adeles modulo the principal 
ideles. 

This work of Connes derives in particular from his work with Bost [2] which was in 
turn inspired by ideas of Julia [11] and others. The aim is to enrich our knowledge of the 
Riemann zeta function by creating a dictionary between its properties and phenomena in 
Statistical mechanics. The starting point of these approaches is the observation that, just 
as the zeta functions encode arithmetic information, the partition functions of quantum 
Statistical mechanical systems encode their large-scale thermodynamical properties. The 
first step therefore is to construct a quantum dynamical system with partition function the 
Riemann zeta function, or the Dedekind zeta function in the general number field case. In 
order for the quantum dynamical system to reflect the arithmetic of the primes, it must 
capture also some sort of interaction between them. This last feature translates in the 
statistical mechanical language into the phenomenon of spontaneous symmetry breaking at 
a critical temperature with respect to a natural symmetry group. In the region of high 
temperature, there is a unique equilibrium state as the system is in disorder and symmetric 
with respect to the action of the symmetry group. In the region of low temperature, a phase 
transition occurs and the symmetry is broken. This symmetry group acts transitively on a 
family of possible extremal equilibrium states. The construction of a quantum dynamical 
system with partition function the Riemann zeta function ¢(@) and spontaneous symmetry 
breaking or phase transition at its pole 8 = 1 with respect to a natural symmetry group 
was achieved by Bost and Connes in [2]. A different construction of the basic algebra using 
crossed products was proposed by Laca and Raeburn and extended to the number field case 
by them with Arledge in [1]. An extension of the work of Bost and Connes to general global 
fields was done by Harari and Leichtnam in [10]. The generalisation proposed by Harari and 
Leichtnam in [10] fails to capture the Dedekind zeta function as partition function in the 
case of a number field with class number greater than 1. Their partition function in that case 
is the Dedekind zeta function with a finite number of non-canonically chosen Euler factors 
removed. This prompted the author’s paper [3] where the full Dedekind zeta function is 
recovered as partition function. This is achieved by recasting the original construction of 
Bost and Connes more completely in terms of adeles and ideles. The symmetry group of 
the system constructed by Bost and Connes is a Galois group, in fact the Galois group over 
the rational number field of its maximal abelian extension. Using the Artin isomorphism, 
which says that this symmetry group is also the unit group of the finite ideles, Bost and 
Connes recover the actual Galois action on the elements of the maximal abelian extension 
via its action on the equilibrium states of the system. In the general number field case, 
the symmetry group is again the unit group of the finite ideles, but this group does not in 
general have a Galois interpretation. See [10] for a discussion of this point. 

In [7], Connes outlines a tantilising analogy between the Galois correspondence in number 
theory and the classification of factors in the theory of von-Neumann algebras. This is in 
the spirit of the idea of André Weil that the key to understanding the Riemann hypothesis 


may well lie in a decent Galois interpretation of the idéle class group of a number field. 
The difficulty here lies in the intervention of the archimedean valuations and lies beyond, 
for example, what is known about the case of function fields of varieties over finite fields. 
Connes maintains that aspects of the theory of the classification of factors represents a type 
of Galois theory adapted to the real and complex numbers. 

In this paper we provide an introduction both to the work of Bost-Connes and to the 
work of Connes on the Riemann hypothesis. Our purpose is to assist the interested reader 
to understand these works. We restrict our attention to the rational numbers and to the 
Riemann zeta function. It should be noted that a solution of the Riemann hypothesis may 
well necessarily involve studying the zeta function as part of a family of zeta functions in 
the spirit of [12]. 

Acknowledgements: The author thanks the Ellentuck Fund and the School of Mathematics 
of the Institute for Advanced Study, Princeton, for their support during the preparation of 


this paper. 
2 The problem studied by Bost—Connes 
Before stating the problem solved by Bost and Connes in [2] and its analogue for number 


fields, we recall a few basic notions from the C*-algebraic formulation of quantum statistical 
mechanics. For the background, see [4]. Recall that a C*-algebra B is an algebra over the 


complex numbers C with an adjoint z ++ σ΄, x € B, that is, an anti-linear map with z** = z, 
(αν)" = y*a*, σιν € B, and a norm ||. || with respect to which B is complete and addition 
and multiplication are continuous operations. One requires in addition that ||xz*|| = ||z||? 


for all z € B. All our C*-algebras will be assumed unital. The most basic example of a 
noncommutative C*-algebra is B = Myn(C) for N > 2 an integer. The C’*-algebra plays the 
role of the “space” on which the system evolves, the evolution itself being described by a 
1-parameter group of C*-automorphisms o : αὶ "Ὁ Aut(B). The quantum dynamical system 
is therefore the pair (Β, σι). It is customary to use the inverse temperature B = 1/kT rather 
than the temperature T, where k is Boltzmann’s constant. Then, one has the definition of 
Kubo-Martin-Schwinger (KMS) of an equilibrium state at inverse temperature 8. Recall 
that a state y on a C*-algebra B is a positive linear functional on B satisfying y(1) = 1. It 


is the generalisation of a probability distribution. 


Definition 1 Let (B,o;) be a dynamical system, and y a state on B. Then is an equtlib- 
rium state at inverse temperature 3, or KMS,-state, if for each z,y € B there is a function 
F,,y(z), bounded and holomorphic in the band 0 < Im(z) < β and continuous on its closure, 


such that for all t € R, 
Fr y(t) =p(zor(y)), — Fey(t + V—18) = φ(σι(ψ) 4). (1) 
In the case where B = Myn(C), every 1-parameter group σε of automorphisms of B can be 
written in the form, 
σι(α) =e*F ze ot ΕΒ, teR, 


for a self-adjoint matrix H = H*. Then for H > 0 and for all 8 > 0, there is a unique 
KMSg equilibrium state for (B, oz) given by 


φβ(:) = Trace(re—*" ) /Trace(e~), « ε Mn(C). (2) 


ὃ9 


90 


This has the familiar form of a Gibbs state and is easily seen to satisfy the KMS, condition 
of 1. The KMSg states can therefore be seen as generalisations of Gibbs states. The 
normalisation constant Trace(e~°) is known as the partition function of the system. A 
symmetry group G of the dynamical system (Β, σε) is a subgroup of Aut(B) commuting 
with a: | | 

9°0, = 0199, φεσ, εκ 


Consider now a system (Β, σε) with interaction. Then, guided by quantum statistical me- 
chanics, we expect to see the following features. When the temperature is high, so that β 
is small, the system is in disorder, there is no interaction between its constituents and the 
state of the system does not see the action of the symmetry group G: the KMS,-state is 
unique. As the temperature is lowered, the constituents of the system begin to interact. 
At a critical temperature Bp a phase transition occurs and the symmetry is broken. The 
symmetry group G then permutes transitively a family of extremal KMS,z- states generat- 
ing the possible states of the system after phase transition: the KMS,-state is no longer 
unique. This phase transition phenomenon is known as spontaneous symmetry breaking at 
the critical inverse temperature Bo. The partition function should have a pole at βο. For a 
fuller explanation, see [2]. The problem solved by Bost and Connes was the following. 


Problem 1: Construct a dynamical system (Β, σι) with partition function the zeta function 
ζ(() of Riemann, where GB > 0 is the inverse temperature, having spontaneous symmetry 
breaking at the pole β = 1 of the zeta function with respect to a natural symmetry group. 


As mentioned in the introduction, the symmetry group is the unit group of the ideles, 
given by W = ΠῚ, Z5 where the product is over the primes p and Z5 = {up € Ὁ : |up| p= |}. 
We use here the normalisation |p|, = p~!. This is the same as the Galois group Gal(Q*? /Q). 
Here Q*® is the maximal abelian extension of the rational number field Q, which in turn is 


isomorphic to its maximal cyclotomic extension, that is the extension obtained by adjoining 


to Q all the roots of unity. The interaction detected in the phase transition comes about 
from the interaction between the primes coming from considering at once all the embeddings 
of the non-zero rational numbers Q* into the completions Q, of Q with respect to the prime 
valuations |.|,. The natural generalisation of this problem to the number field case was 
solved in [3] and is the following. 


Problem 2: Given a number field K, construct a dynamical system (Β, σε) with partition 
function the Dedekind zeta function ζκ(), where B > 0 is the inverse temperature, having 
spontaneous symmetry breaking at the pole @ = 1 of the Dedekind function with respect to a 
natural symmetry group. 


Recall that the Dedekind zeta function is given by 


1 
CK(s)= Dame _—*Re(s) > 1. (3) 
ee eco NO 


Here © is the ring of integers of K and the summation is over the ideals C of K contained in 
O. The symmetry group is the unit group of the finite ideles of K. The main new difficulty 
that arose in the number field case was to generalise to the case of class number greater than 
1. One has to use the fact that the principal prime ideals already provide enough interaction 
to engender a phenomenon of spontaneous symmetry breaking. 

For the natural generalisation to the function field case see [10]. For the sake of expo- 
sition, we restrict ourselves in the sequel to the case of the rational numbers, that is to a 
discussion of Problem 1. 


3 Construction of the C*-algebra 


We give a different construction of the C*-algebra of [2] to that found in their original paper. 


91 


It is essentially equivalent to the construction of [1], except that we work with adeles and . 


ideles. In the generalisation to the number field case, this makes quite a difference. Let A 
_ denote the finite adeles of Ὁ, that is the restricted product of Ὁ, with respect to Ζ;. Recall 

that this restricted product consists of the infinite vectors (ap), indexed by the primes PD; 
such that ap € Ὁ, with a, € Z, for almost all primes p. The (finite) adeles form a ring 
under componentwise addition and multiplication. The (finite) ideles 7 are the invertible 
elements of the adeles. They form a group under componentwise multiplication. Let Z5 be 
those elements of up € Zp with |up|p = 1. Notice that an idele (up), has up € ΩΣ with 
up € Z> for almost all primes p. Let 


R=][Z, I=IJNR, We=I|[Z}. 
» Ρ 


Further, let I denoted the semigroup of integral ideals of Z. It is the semigroup of Z-modules 
of the form mZ where m Ε Z. Notice that I as above is also a semigroup. We have a natural 
short exact sequence, _ 


1-2ὁ}ἷ} - 1 - 1 - 1. (4) 


The map J + Tin this short exact sequence is given as follows. To (up)p € J associate the 
ideal |], p°"4?(“e) where ordp(up) is determined by the formula [up|p = p~°'4?(r)., It is clear 
that this map is surjective with kernel W, that is that the above sequence is indeed short 
exact. By the Strong Approximation Theorem we have 


Q/Z x A/R ~ @,Q,/Z> (5) 


and we have therefore a natural action of J on Q/Z by multiplication in A/R and transport 
of structure. We use here that IR C R. Mostly we shall work in A/R rather than Q/Z. 
We have the following straightforward Lemma (see [3)). 


Lemma 1 Fora= (ap)p € I andy € A/R, the equation 
αα τεῦ 
has n(a) =: II, ρϑτα»(α») solutions in x € A/R. Denote these solutions by {x : ax = yj. 


In the above lemma it is important to bear in mind that we are computing modulo R. 
Now, let C[A/R] =: span{d, : zc € A/R} be the group algebra of A/R over C, so that 
6252! = Se42: for x, 2' € A/R. We have (see for comparison [1)), 


aa (dy) = na), 2 δα 


4) κα: az=y] 


Lemma 2 The formula 


for a €I defines an action of I by endomorphisms of C*(A/R). 


The endomorphism az, for a € I is a one-sided inverse of the map ὅς +4 daz for x Ε A/R, so 
it is like a semigroup “division”. The C*-algebra can be thought of as the operator norm 
closure of C[A/R] in its natural left regular representation in 1?(A/R). We now appeal to 
the notion of semigroup crossed product developed by Laca and Raeburn and used in [1], 


92 


applying it to our situation. A covariant representation of (C*(A/R),J,q) is a pair (π, V) 
where 
a:C*(A/R) - B(H) 


is a unital representation and 
| V:I- B(H) 

is an isometric representation in the bounded operators in a Hilbert space H. The pair 
(7, Μὴ is required to satisfy, 


™(aa(f)) = Var(f)Ve, αἀε!, fEec*(A/R). 


Notice that the V, are not in general unitary. Such a representation is given by (A, Z) on 
12(A/R) with orthonormal basis {ες : z € A/R} where 4 is the left. regular representation 
of C*(A/R) on [?(A/R) and 


1 
La SS xz 
vey fala) oy 


The universal covariant representation, through which all other covariant representations 
factor, is called the (semigroup) crossed product C*(A/R) xqJ. This algebra is the universal 
C'*-algebra generated by the symbols {e(z) : ὦ € A/R} and {pa : a € I} subject to the 
relations 


Hola = 1, μαμὺὴτΞ μα, 0,0€ 7, (6) 

e(0)=1, e(z)*=e(-z), e(zje(y)=e(z+y), 2, yEA/R, (7) 

ZS ee)= meus eve a/R. ϑ 
{z:ar=y] 


The relations in (6) reflect a multiplicative structure, those in (7) an additive structure and 
those in (8) how these multiplicative and additive structures are related via the crossed 
product action. Julia [11] observed that by using only the multiplicative structure of the 
integers one cannot hope to capture an interaction between the different primes. When 
u € W then μι, is a unitary, so that μὲ μι = Mut = 1 and we have for all cz € A/R, 


μυε(α)μὲ -- ε(ω 1“), — whe(z) uy = e(uz). (9) 


Therefore we have a natural action of W as inner automorphisms of C*(A/R) Xo I using 
(9). 

To recover the C*-algebra of [2] we must split the short exact sequence (4). The ideals 
in I are all of the form mZ for some m € Z. This generator m is determined up to sign. 
Consider the image of [πὶ] in J under the diagonal embedding q +> (q)p of Q* into J, where 
the p-th component of (4) is the image of q in ΟΣ under the natural embedding of Q* in 
ΩΣ. The map 


+:mZ++ (\ml)p (10) 


defines a splitting of (4). Let J, denote the image and define B to be the semigroup 
crossed product C*(A/R) Xa J, with the restricted action a from I to 1... By transport 


of structure using (5), this algebra is easily seen to be isomorphic to a semigroup crossed 
product of C*(Q/Z) by N,;, where N, denotes the positive natural numbers. This is the 
algebra constructed in [2] (see also [1]). ;From now on, we use the symbols {e(z) : x € Q/Z} 
and {μα :a € N,}. It is essential to split the short exact sequence in this way in order to 
obtain the symmetry breaking phenomenon. In particular, this replacement of J by 16. now 
means that the group W acts by outer automorphisms. For z € B, one has that p* zp, is 
still in B (computing in the larger algebra C'*(A/R) xq J), but now this defines an outer 
action of W. This coincides with the definition of W as the symmetry group as in (2]. 


4 The Theorem of Bost—Connes 


Using the abstract description of the C*-algebra B of §3, to define the time evolution o of 
our dynamical system (Β, σὴ it suffices to define it on the symbols {e(z) :  Ε Q/Z} and 
{Ha:a€N,}. Forte R, let σε be the automorphism of B defined by 


σε(μηι) =m", meEN,, o2(e(z))=e(z), « Ε Q/Z. (11) 


By (6) and (9) we clearly have that the action of W commutes with this 1-parameter group 
σε. Hence W will permute the extremal KMS,-states of (Β, σι). To describe the KMSz- 
states for @ > 1, we shall represent (B,o;) on a Hilbert space. Namely, following [2], let 
H be the Hilbert space /*(N,) with canonical orthonormal basis {em,m Ε N,}. For each 
u € W, one has a representation 7, of B in B(H) given by, 


Tu(Lm)En = Emn) m,n Ε Ny 


Wy (e(z))éen = exp(Zimnuoxz)e,, nE Ny, : € Q/Z. (12) 


Here uoz for u € W and z € Q/Z is the multiplication induced by transport of structure 
using (5). One verifies easily that (12) does indeed give a C*-algebra representation of B. 
Let H be the unbounded operator in # whose action on the canonical basis is given by 


He, = (logn)en, ne N,. (13) 
Then clearly, for each u € W, we have 
πε(σι(4)) =e"May(z)e "4, = teR ΕΒ. 
Notice that, for @ > 1, 


οο oo oo 
Trace(e“°#) = S$ (6 δῆς en) = Son Fen, En) = Do, 
n=1 n=1 


n=1 
so that the Riemann zeta function appears as a partition function of Gibbs state type. We 


can now state the main result of [2]. 


Theorem 1 (Bost-Connes) The dynamical system (B,o1) has symmetry group W. The 
action of u € W is given by [u] € Aut(B) where 


[u]J:e(y) + e(ucy), yEQ/Z, [(υ] μα κὸ μα, a EN. 


93 


94 


This action commutes with o, 
τς [u] oog = σι © [u], ucWw, teR. 


Moreover, (1) for 0 < β <1, there is a unique KMS, state. (It is a factor state of Type Ill, 
with associated factor the Araki-Woods factor R...) (2) for β > 1 andu€ W, the state 


bp,u(z) = ¢(8)~'Trace(my(z)e“9"), ceEB 


is a KMS, state for (B,o;). (It is a factor state of Type1,.). The action of W on B induces 
an action on these KMSg states which permutes them transitively and the map ὦ τὸ dg.u 
is a homomorphism of the compact group W onto the space Eg of extremal points of the 
simpler of KMS, states for (Β, σι). (3) the ¢ function of Riemann is the partition function 
of (B, σι). 


Part (1) of the above theorem is difficult and the reader is referred to [2] for complete details, 
as for a full proof of (2). That for 6 > 1 the KMS,-states given in part (2) fulfil 1 of 82 is a 
straightforward exercise. Notice that they have the form of Gibbs equilibrium states. 


5 The space and group action in Connes’ approach to 
the Riemann hypothesis 


Theorem 1 solves Problem 1 of §2. More information is contained in its proof however. As 
mentioned in the 81, given the existence of the Artin isomorphism in class field theory for - 
the rationals, one can recover the Galois action of W explicitly. It is still an open problem to 
exhibit this Galois action in terms of an analogue of (Β, σι) in a satisfactory way for general 
number fields. Indeed, in [3] the analogue of W is again an infinite product over unit groups 
in the integers of local fields, completions of the number field at the prime ideals. But this is 
no longer in general isomorphic to the Galois group of the maximal abelian extension of the 
number field, which is the symmetry group one wants to recover. It appears that one must 
generalise the approach of Bost-Connes so that the extremal KMS states, when restricted to 
the basic symbols generating the C*-algebra, take values in the maximal abelian extension 
of the number field. The action of the symmetry group on these extremal states should then 
induce the action of the Galois group of the maximal abelian extension on these values. 
Another feature occurs in the analysis of the proof of part (1) of Theorem 1. One can 
treat the infinite places in a similar way to that already described for the finite places, so 
working with the (full) adeles A and (full) ideles J. The ring of adeles A of Q consists of 
the infinite vectors (do0,@p)p indexed by the archimedean place and the primes p of Q with 
ap € Z, for all but finitely many p. The group J of ideles consists of the infinite vectors 
(Uco,Up)p With Ug € Ro # Ὁ and up Ε Q,, up ζῇ 0 and [ρ|» = 1 for all but finitely 
many primes p. There is a norm | - | defined on J given by |u| = |uool I], juplp. We have 
natural diagonal embeddings of Ὁ in A and Q* = Q \ {0} in J induced by the embeddings 
of Q into its completions. Notice that by the product formula Q* C Ker] -|. We define an 
equivalence relation on A by a = ὁ if and only if there exists a g € Q* with a = gb. With 
respect to this equivalence, we form the coset space X = A/Q*. The ideles J act on A by 
componentwise multiplication, which induces an action of C = J/Q* on X. Notice that this 
action has fixed points. For example, whenever an adele a has a, = 0 it is a fixed point of 
the embedding of QF into J (to gp Ε Q one assigns the idele with 1 in every place except the 
pth place.) On the other hand, every Type III, factor has a continuous decomposition, that 


is it can be written as a cross-product of R with a Type II, factor. Connes has observed 
that the von-Neumann algebra of Type III, in the region 0 < β < 1 of Theorem 1 has in 
its continuous decomposition the Type [1.0 factor given by the crossed product of L(A) 
by the action of Q* by multiplication. The associated von Neumann algebra has orbit space 
X = A/Q. | 

The pair (X, ΟἽ plays a fundamental role in Connes’s proposed approach to the Riemann 
hypothesis in [5]. We remark that Connes’s approach does not as yet reprove the established 
analogue of the Riemann hypothesis in this case: if it did so this would provide an alternative 
proof. Indeed, even for the case of the projective line, where the Riemann hypothesis is 
trivial, the approach of Connes demands an analysis not only of the counting of points over 
finite fields on the projective line, but also an understanding of the moduli space of vector 
bundles over the projective line, due to the fact that all the characters of the idéle class group 
also intervene. In order for his approach to work, Connes would need to be able to prove 
an asymptotic formula for the trace of the action of C on X. A “spectral” analysis of this 
action, related to older ideas of Weil [17] and Tate, shows it to be related to the non-trivial 
zeros of the L-functions with Grossencharacter, and a heurisitic “geometric” analysis relates 
this action to the Weil distribution. The “geometric” heuristics aim at suggesting a proof 
for the positivity of the Weil distribution which is known to imply the Riemann hypothesis. 
In the subsequent sections we give a leisurely introduction to these ideas. 


6 <A trace formula over the reals 


There is a rigorous local version of the geometric side of the asymptotic trace formula of 
[6]. For simplicity we work at the infinite place, the discussion at the prime places being 
analogous, and justify the formula in terms of the simple computation of the distributional 
trace. Consider the action of R* = R \ {0} on R by multiplication. This induces the action 


on smooth functions on R, 
U(ujg =gou}, με, gec™(R). (14) 


We average this action over a function of rapid decay. Namely, for h € S(R*), we let U(h) 
be the operator in L?(R) given by, 


U(h) = [ h(u)U (u)d*u, (15) 
R* 
where the multiplicative Haar measure d*u is normalised by, 
[ d*u = log Δ, A -> oo. (16) 
|uje[1,A] 
The associated kernel of U(u) is 
k(z,u-",y) = δίν ~ u-*z), (17) 
and for u #1, 
[ k(a,u-*,a)dx = |1—u|7?. (18) 
R 


If h € S(R*) and h(1) = 0, we define the distributional trace of U(h) to be, 


“Trace” (U(h)) = [ . Ἐς d*u. (19) 


95 


96 


To state an ordinary trace formula in the general case h(1) # 0, we introduce the principal 
part of the integral in (19), defined by 


“κω ἢ) h(u~*) 
d*u=< L, 
1 — οὶ |u| 
where L is the unique distribution on R which agrees with oat for u # 1 and whose Fourier 


transform vanishes at 1. If g(u) = h((u+1)~')/|u+1|~!, then from [6], §V, we have the 
formula 


>, (20) 


! da ~ 
[“Έ =- [ au)t0¢|uldu. (21) 
| | lal R 
Let A > 0 and let Pa be the projection given by multiplication by the characteristic function 
of the set 


{€ € L?(R) : (xz) = 0, for |z| > A}. (22) 


Let F denote the Fourier transform in L?(R) and P, = FP,AF7-}. Let Ra = P,P. In [6] 
the following is proved. 


Proposition 1 Jf h € S,(IR*), then RaU(h) is a trace class operator in L*(IR) and as 
A — oo we have the asymptotic formula 


Trace(R,U(h)) = 2h(1) log A + [ ΙΝ d*u +0(1). (23) 


[1 


An analogous result holds for the local fields Q, and also if one considers a finite number 


᾿ of places at a time. The problem is that the error term then implied by the “o(1)” is not 


uniform in the set of primes chosen so that one cannot pass to the limit over all primes. 
This seems to represent a deep difficulty of an analytic number theoretic nature. 


7 The Polya-Hilbert space 


Connes proposes a Polya-Hilbert space for the problem in the following way. Let S(A)po 
denote the subspace of S(A) given by, 


S(A)o = {f € S(A) : 700) = / 74 = 0}. (24) 


Let E be the averaging over Q* operator which to f € S(A)po associates the element of S(C’) 
given by 


E(f)(u) = |ul/? 5 f(qu). (25) 


4Ε0" 


For 6 > 0, let L?(X)o,5 be the completion of S(A)o with respect to the norm given by, 


113 = | \E(F)(u)2(1 + log? jul)/2d"u, for [ d'u~logA, Arco. (26) 
Cc {uje{1,A] 


If g(x) = f (qx) for some fixed g € Q*, then llglls = [|| and so one sees that this norm 
respects, in this sense, the passage to the quotient A/Q*. We define L? (* )s by the short 
exact sequence 


When 6 = 0 we write L*(X)o and L?(X) for the first two terms. Here C is the trivial 
C-module and C(1) is the C-module for which u € C acts by |u|, where | - | is the norm on 
Ο. Multiplication of C on A induces a representation of C on L?(X) given by, : 


| (U(A)E)(2) = (Aa). (28) 
We introduce a Hilbert space Hs via another short exact sequence, 
0+ L?(X)os + L?(C)s5 + Hs 0. ΝΕ (29) 


Here 1: (ΟἿς is the completion with respect to the weighted Haar measure as in(26), where 
we write L?(C) when ὃ = 0. The spectral interpretation on Ης of the critical zeros of the 
L-functions in [5] relies on taking 6 > 0. Indeed, this is needed to control the growth of 
the functions on the non-compact quotient X: ultimately this parameter is eliminated from 
conjectural trace formula by using cut-offs. It is important here to use the measure |u|d*u 
(implicit in (26)) instead of the additive Haar measure dz, this difference being a veritable 
one for global fields, where one has dz = lime_49 e|a|!+€ d* x 

The regular representation descends to Ης (it commutes with FE up to a phase as an easy 
calculation shows) and we denote it by W. Connes describes (H;,W) as the Polya-Hilbert 
space with group action for his approach to the Riemann hypothesis. He proves in [5] and 
[6] the following remarkable result relating the trace of this action to the zeros on the critical 
line of the L-functions with Grossencharacter. 


Theorem 2 For any Schwartz function hé S(C) the operator 
W(h) = [ W(u)h(u)d*u 
σ 


in H is trace class, and its trace is given by 


Trace(W(h))= >> Aly, is) 
L(x,4+is)=0,8sER 


where the sum is over the characters x of C with the multiplicity of the zero being counted 


as the largest integer n « £(1+ 6) with n at most the multiplicity of ἢ + is as a zero of 


L(x, 2). Moreover, we define, 
Ac) = [ hu)x(u)lulh dw. 


Now, the action of C is free on L?(C)5 so that the short exact sequence (29) tells us that 
the trace of the action of C' on Hs should be, up to a correction due to a regularisation, the 
negative of the trace of the action of C on L?(X)o,s. From (27), we see that the regularised 
trace of the action of C on L?(X)5 should involve the sum of the corresponding trace on 
L?(X)po,s and the trace on C @ C(1). Therefore the regularised trace of the action of C on 
L?(X)s should involve the trace of the action on C @ C(1) minus the trace of this action on 
H;. This minus sign is crucial for the comparison with the Weil distribution. 


97 


03 L7(X)os -Ὁ 1(X)s + C@C(1) +0. sO (27) | 


98 


The following result going back to Weil [17] shows how the operation E in (25) brings in 
the zeros of the L functions in the critical strip and provides the key to proof of Theorem 2 
and the appearance of the non-trivial zeros of the L(x, 8) in Proposition 3. It indicates that 
the non-trivial zeros of the L- functions should “span” Hs as they are “orthogonal” to the 


image of £ 


Proposition 2 Let x be a character on C. For any p € C with Re(p Ε] -- $,3[, we have 


[ Ε(ξ)(ὠχ(ωλυράτω --06, forall ε ε S(A)o 


precisely when L(x,4 + p) = 0. 


8 Relation to the Weil distribution 


In [6] Connes works with a cut-off in order to avoid the parameter 6 of 87. He introduces 
a family of subspaces Bao of L?(X)9, depending on a real parameter A > 0, such that 
E(Ba, 0) C Sa where, | 

Sa = {€ € L(C) : €(u) =0, for Jul ¢ [A7?, AJ}. (30) 


In the case of global fields of positive characteristic it makes sense to introduce Ba,o as 
the space spanned by the f € S(A)o that together with their Fourier transform vanish for 
|z| > A. In the number field case, all such functions (on the reals) are trivial, and Ba,o is 
described in terms of prolate spheroidal functions. This aspect of [6] is very technical and 
we do not attempt to go into it here as it offers little additional insight into the underlying 
ideas. As the asymptotic trace formula involving them is conjectural, we assume for our 
present purpose the existence of an appropriate family of Bao. Let Qa,o be the orthogonal 
projection onto Bao and Q4 9 = EQ,aoE—!. By assumption, 

Qr,0 “ Sa; (31) 
where ΘΛ denotes the projection onto the set Sq in (30). Therefore, for all A > 0, the 
following distribution is positive, 

Aa(f) = Trace((Sa -Qao)V(f)), f€S(C), (32) 
where V is the regular representation of C on L?(C) and V(f) = fo f(h)V(u)d*u. The 
positivity of A, signifies that for f € S(C) we have 

Aa(f * f*) > 0, (33) 
where f*(u) = f(u7!). Therefore the limiting distribution is also positive, 


Ao = lim A, > 0. (34) 
A-00 


In [6] the following conjecture is formulated, together with a proposed construction of the 
family Bao. 


Conjecture 1 One can find subspaces Byo of L*(X)o such that the distributions A, are of 
positive type and converge to the Weil distribution, so that for h € S(C) of compact support, 


Aco(h) = [ h(u)({ul!/? + jul-?/?)d*u -- δ [ 0 er ae (35) 


where v runs over the places of Q. 


It is known that the positivity of the Weil distribution, which would follow from Conjecture 
1 using (34), implies the Riemann hypothesis (see for example [14]). One can view Aq, via 
(32), as a distribution associated to the regularised trace of the action of C on the Hilbert 
space Ης of §7 with the cut-off at |u| = A enabling one to take ὃ = 0. The factor |u|!/? in 
the integrands of (35) as opposed to those of (23) is due to the passage to the quotient by 
the image of E. 

We can reformulate Conjecture 1 in terms of the sequence of projections Qa where Qa 
is the projection in L?(X) onto Bao ® C @ C(1) as follows. | 


Conjecture 2 There is a sequence of closed projections Qa, A > 0, in L?(X) extending 
the Qa.io above such that for h € S(C) of compact support we have, as A — ov, 


συ +0(1), (36) 


Trace(Q,U(h)) = 2h(1) log’ A + vf 


where the sum is over the places v of Q. Here 2log' A= Ji νι E[A-1,A] d*u which is asymptotic 
to 2log A for the correct choice of Haar measure d*u on 
9 Relation of the limit distribution to the zeros 


The distributions A, can be regarded as a sequence of distributions attached to the Polya- 
Hilbert space and action (Hs,W) of §7. We have the following interpretation of A. in 


terms of the zeros of the L-functions with Grossencharacter: compare with Theorem 2 and. 


notice how the use of the asymptotics of the cut-off has eliminated the parameter 6. 
Proposition 3 Let h be a function of compact support on S(C). Then, 
Aooth) = ΣΝ ΟΣ +0) | Roe 2)duo(2) (37) 
ΧΡ 


where the sum is over the pairs (x, p) of characters of C and zeros p of L(x,% 5 +p) with 
Re(p) ΕἸ — 3, 3[. The number N(x, Σ +p) ts the multiplicity of the zero, the measure ἀμρ (z) 


is the harmonic measure of p with respect to the line iR C C and the Fourier transform h 
of h is defined by 


Fioesp) = fF r(u)x(u)lulPars | (38) 
In [6] Proposition 3 is proven in the function field case, and the proof for number fields 


is outlined. It allows one to derive under Conjecture 1, the following conjectural explicit 
formula. 


Conjecture 3 Let h € S(C) be of compact support, then 
A(u) 1/24 
Xf. Terie o 


= 1/2 u|—/2 *,, 1 ν 
[ μ(ω)(|1}75 + ful-2/) aw Σ ΝΟΣ +0 J Boe #)dHol®) 


where v runs over the places of Q. 


99 


100 


Proposition 3 gives an interpretation of the non-trivial zeros of the Riemann zeta function 
with the non-critical zeros appearing as resonances for the harmonic measure with respect 
to the real line. Conjecture 3 is the global analogue of Proposition 1 and can be thought 
of as a conjectural trace formula for the action of C on L?(X). In accordance with the 
comments preceding Proposition 2, one sees that for the action of C on L?(X), the critical 
zeros would enter the trace formula with a minus sign, which Connes interprets as their 
being an absorption spectrum for the action of C on A/Q*. This represents an important 
new feature over other proposed spectral models to date coming from physics. It is in 
complete agreement with the intuitions gleaned form the proofs in the function field case 
that yield a spectral interpretation of the zeros as the eigenvalues of the action of Frobenius 
on l-adic cohomology. The analogue of the Polya-Hilbert space is given, in the curve case (if 
one replaces C by Q,, with | 4 p where p is the base field characteristic) by the 1st degree 
-étale cohomology group on the curve with coefficients in Q,. This group appears with an 
overall minus sign in the Lefchetz formula. In other words, the spectral interpretation of 
the zeros of the Riemann zeta function should be as an absorption spectrum rather than as 
an emission spectrum, if one uses the language of spectroscopy. 


10 A global geometric trace formula 


One of the most interesting heuristic aspects of Connes’s ideas, which appears already in 
[5], is the interpretation of the global analogue of Proposition 1 as a non-commutative trace 
formula for a flow given in Conjecture 2. In [5], the validity of Conjecture 2 was interpreted as 
the existence of a noncommutative generalisation of a distributional trace formula for flows 
on compact manifolds (see [9]). If M is a smooth compact manifold with an everywhere 
non-vanishing vector field €, then the associated flow is given by the 1- -parameter group 
{Εἰ = exp t€}zer which induces an action of R on smooth functions on M by, 


Ui)f=feR, 1ε (0. (39) 


For h € S(M), consider the operator, 
U(h) = | U(t)h(t)de, (40) 
R 


where we suppose for simplicity that h(0) = 0. Then we know from [9] that the distributional 
trace is given by, 


“Trace” (U(h)) = DI, Ἀ() — pat (41) 


where the sum is over the primitive seriodic orbits 7 and where H, is the isotropy group 
of any z € 7 with measure ἀπὲ normalised so that H, has covolume 1. We have denoted 
by P; the restriction of the differential d(exp(t£))z to the space transverse to the orbits (the 
Poincaré map) and by [1 — P;| the absolute value of the determinant of 1 — P; which is 
assumed non-degenerate. Now, observe the following fact about the pair (Χ, 6) proved in 


[5]. 
Lemma 8 For z € X,x #0, the isotropy group of x in C is compact if and only if there 
exists exactly one place v of Q with x, =0 where (00, Zp)p is any lift of x to A. 


If  € A with z, = Ὁ at only one place v of Q, then the isotropy group of z in C is 
isomorphic to Q* and the transverse space is isomorphic to Q,. Hence the analogue of the 


101 


Poincaré return map for this fixed point is multiplication of Q* on Q, whose trace formula 
we discussed in 86 for v = oo. We therefore write suggestively [1 — P,| = [1 — uj, ὦ Ε Q.. 
Let h € S(C) with compact support and A(1) = 0. The analogue of the trace formula 
(41) becomes, once we agree to single out the fixed points with compact isotropy group 
determined by Lemma 3, 


“Trace” (U(h)) = > [ 1 du. (42) 


This is consistent with the local computation of §6 when h(1) = 0 and introduces the sum 
over local terms in (35) as coming from an analogue for A/Q* of a geometric trace formula 
for flows. | 

These heuristics show that the validity of the Riemann hypothesis would imply that the 
above analogue of the distributional trace formula for flows on manifolds makes sense for 
the pair (X,C). This is remarkable in light of the fact that the space X displays none of 
the regularity properties needed in the proof of the manifold case. 


11 Concluding remarks 


The work of Bost-Connes and Connes leads naturally to investigating (A/Q*,C’) as the 
geometric site and (Hs,W) as the spectral site for the study of the generalised Riemann 
Hypothesis. Although much of Connes’s suggested approach to the Riemann hypothesis 
is conjectural, it seems likely that a deeper understanding of the space A/Q* should have 
interesting consequences. The analogy between the trace formula for flows on smooth com- 
pact manifolds of [9] and the conjectured trace formula in (42), suggests the interpretation 
of Weil’s explicit formula as a trace formula on A/Q*. It should be noted that Paul Cohen 
proposed in unpublished remarks in the 1970’s that in order to understand the Riemann 
Hypothesis one should understand the nature of measurable Q*-invariant functions on A. 
The fact that the von-Neumann algebra associated to A/Q* is of Type 11.0, as shown by 
the work of Bost-Connes, indicates that it is not possible to do classical measure (Type I) 
theory on A/Q*. Non-classical ideas are therefore needed and the work of Connes is a bold 
step in that direction. 

Some very recent ideas of Connes expressed at the AMS/Clay Institute conference on 
Noncommutative Geometry in Mount Holyoke in June 2000 indicate that he may be able to 
reexpress the conjectured trace formulae in terms of indices associated to Fredholm modules 
in the sense of Connes’s noncommutative geometry. This would lay the investigation open 
to the techniques of that theory. 

To date, no spectral interpretation of the zeros, even in the function field case, has 
yielded in itself a proof of the Riemann hypothesis: something else is always needed. In the 
function field case this extra ingredient is Castelnuovo positivity. In one of Weil’s proofs 
of the curve case over finite fields, he uses the explicit formula (which Connes conjectures 
can be described as an appropriate trace formula) together with the proof that the Weil 
distribution is positive. By contrast, Connes in [6] worked by construction with the Hilbert 
space H, where positivity was ensured by definition and conjectured an appropriate explicit 
formula, thereby reversing in some sense the logic of the proofs in the curve case. In the 
more recent approach [7], positivity is part of the problem. One constructs a sequence of 
projections which are positive and the difficult part is to prove that their limit gives the 
Weil distribution. The spectral interpretation is there, but once more it is not enough. 

The conjectures and analogies in Connes’s work seem to offer much more than an ap- 
proach to the Riemann hypothesis. They suggest a fruitful interaction between the theories 


102 


of operator algebras and C*-algebras and the analytic and algebraic theories of numbers. A 
likely source for the immediate future of a new input into number theory is Connes’s outline 
of a noncommutative Brauer theory given in [7]. Even if the Riemann hypothesis is proven 
by techniques independent of those proposed by Connes, the set-up he is proposing could 
then step in armed with the validity of the Riemann hypothesis to view arithmetic in a new 
and exciting way. An immediate challenge is therefore to understand and exploit Connes’s 
approach in the function field case where the Riemann hypothesis is known. This has not 
even been carried out for the case of the projective line where the analysis should be most 
tractable. 


References 


[1] J. ARLEDGE, M. Laca, I. RAEBURN, Semigroup crossed products and 
Hecke algebras arising from number fields, Doc. Mathematica 2 (1997) 115-- 
138. 


[2] J-B. Bost, A. CoNNES, Hecke Algebras, Type III factors and phase transi- 
tions with spontaneous symmetry breaking in number theory, Selecta Math. 
(New Series), 1, 411-457 (1995). 


[3] P.B. COHEN, A C*-dynamical system with Dedekind zeta partition function 
and spontaneous symmetry breaking, J. Théorie des Nombres de Bordeaux, 
11 (1999), 15-30 


[4 A. CONNES, Noncommutative Geometry, Academic Press, 1994. (Version 
francaise: Géométrie non commutative, InterEditions, Paris, 1990.) 


[5] A. CONNES, Formule de trace en géométrie non commutative et hypothese de 
Riemann, C.R. Acad. Sci. Paris, t.823, Série 1 (Analyse) 1231-1236 (1996). 


[6] A. CONNES, Trace formula in Noncommutative Geometry and the zeros of 
the Riemann zeta function, Selecta Mathematica, New Series 5, n. 1 (1999), 
290-106. 


[7] A. CONNES, Noncommutative geometry and the Riemann zeta function, in 
Mathematics: frontiers and perspectives, eds, Vladimir Arnold...{et al], IMU, 
AMS 1999, 35-54. 


[8] J. DrIxMIER, Les C*-algebres et leurs représentations, Gauthier-Villars, 
Paris, 1964. 


[9] V. GuILLEMIN, Lectures on spectral theory of elliptic operators, Duke Math. 
J., 44 no. 3, 485-517 (1977). 


[10] Ὁ. HaARARI, E. LEICHTNAM, Extension du phénomene de brisure spontanée 
de symétrie de Bost-Connes au cas des corps globaux quelconques, Selecta 
Mathematica, New Series 3 (1997), 205-243. 


[11] B. JULIA, Statistical Theory of Numbers, in Number Theory and Physics, 
Springer Proceedings in Physics, Vol. 47, 1990. 


[1] N. Katz, P. SARNAK, Random matrices, Frobenius eigenvalues and Mon- 
odromy, AMS Colloquium Publications 45 1999. 


[18] K.R. PARTHASARATHY, An Introduction to Quantum Stochastic Calculus, 
Monographs in Mathematics Vol. 85, Birkhaiiser, 1992. 


[14] S. PATTERSON, An Introduction to the Riemann zeta function, Cambridge 
studies in advanced mathematics Vol 14, Cambridge Uni. Press, 1988. 


[156] B. RIEMANN, Ueber die Anzahl der Primzahlen unter einer gegebenen Grosse, 
Monat der Konigl. Preuss. Akad. der Wissen. zu Berlin aus der Jahre 1859 
(1860), 671-680; also, Gesammelte math. Werke und wissensch. Nachlass, 2. 
Aufl. 1892, 145-155. English translation in H.M. EDWARDS, Riemann’s zeta 
function, Academic Press New York—-London 1974. 


[160] E.C. TITCHMARSH, The Theory of the Riemann Zeta-function, Second Edi- 
tion revised by D.R. Heath-Brown, Oxford University Press, New York, 
1988. 


[17 A. WEIL, Fonctions zeta et distributions, Séminaire Bourbaki 312 (1966). 


Paula B. Cohen 


School of Mathematics UMR AGAT au CNRS 
Institute for Advanced Study Mathématiques, Bat M2 
Kinstein Drive UFR de Mathématiques 


Princeton, 08540 NJ, USA Villeneuve d’Ascq, 59655, FRANCE 
pcohen@math.ias.edu pcohen@agat.univ-lillel .fr 


103 


A HYPERELLIPTIC CURVE 
WITH REAL MULTIPLICATION OF DEGREE TWO 


HARVEY COHN 


Abstract. The analogue of complex multiplication in an elliptic curve is illustrated for a 
hyperelliptic curve with real multiplication of degree two (over C). Humbert’s equation, 
which characterizes such a curve, was derived in 1899 by elegant tour de force of contemporary 
analysis and geometry; but this equation can be derived very simply by use of computer 
algebra on the abelian integrals, leading directly to a sufficient proof of Humbert’s equation 
from his remarkable conic configuration. A set of suitable hyperelliptic coefficient parameters 
is also introduced. The relation to Hilbert modular functions is outlined in an appendix. 


0. Summary. This all began with the AGM (arithmetic-geometric mean) of Gauss 
(1799), (see [1], [5]), which became classically interpreted as a mapping of one elliptic curve 
in two-to-one fashion onto another. Since elliptic curves have one (coefficient) parameter, 
this means, loosely speaking, for some s — 80: 


(νυ = 2(2— η(2 - 52} > {y2 = τοίτο — 1)(20 — 83)}. 


Then when s = So, the curves are the same so these (discrete) values of s become “singular 
values” with “complex multiplication.” (See §1-4 for definitions and other details). 

If we deal with hyperelliptic curves (of genus two), then there was a corresponding AGM 
found by Richelot (1837), (see [1]), also later interpreted as a correspondence between 
two such curves (giving rise to an isogeny between jacobians with kernel isomorphic to 
Z/2Z x Z/2Z). Since these curves have three (coefficient) parameters (see [3]), this means, 
more loosely speaking, for some (s,t,u) — (So, to, uo): 


{y? = x(x —1)(x — s”)(x —t?)(x —u?)} > {y3 = σοίτο — 1)(z0 — 83) (x0 — t3)(z0 — υὖ)}. 


This can not be a two-to-one mapping as in the case of genus one, but a mapping of abelian 
integrals in a manner described below. We then have a relationship due to Georges Hum- 
bert (1899), (see [10]), reducing the curves to two parameters, which he called “singular 
cases” with “real multiplication.” (See §5-7 for definitions and other details). 

Either complex or real multiplication is “singular” in the sense that there is a loss of 
dimension of the parameters. 

The real cases were derived by Humbert as a culmination of nineteenth century state 
of the art, with the use of Kummer surfaces, theta functions, and projective invariants. 
In this paper, we shall justify Humbert’s results by computer on a more elementary and 
(basically) self-contained level, by working with the abelian integrals using MAPLE. 


Presented to the New York Number Theory Seminar under the title “Analogies between real and complex 


multiplication.” 
Mathematics Subject Classification. Primary 14K22, 11F41. Key words and phrases. Complex multiplication, 


real multiplication, Jacobi manifold, Humbert’s criterion. 


Typeset by AnuS-TExX 


105 


D. Chudnovsky et al. (eds.), Number Theory 


© Springer-Verlag New York, Inc. 2004 


106 


Actually, the two singular hyperelliptic parameters are algebraically related to the two 
variables of the field of Hilbert modular functions for Q(/2) (as developed by Hecke and 
Siegel, see [7], [6] [4], [8], [16], [18]). This relation apparently is not known explicitly, but 
no further use is made of Hilbert modular functions here. 

The author had previously worked in this area, mostly on modular equations [4] for 
the (transcendental) Hilbert modular invariants. He is therefore all the more cognizant of 
his debt to Armand Brumer, Mark Heiligman, Everett Howe, Bjorn Poonen, and Jerome 
Solinas for useful conversations which eased the transition to (algebraic) coefficient param- 


eters. 

There is obviously a prepossessing need to know that the transcendental and algebraic 
results are part of the same “language,” and the author notes that Armand Brumer informs 
him of prior (unpublished) results in this area. The author also notes previous papers of 
J.-F. Mestre ([18], [14], (15]) exploring Humbert’s methods and also [3], [17] sources of 
reference. 


COMPLEX MULTIPLICATION OF DEGREE TWO 


1. The period structure. To recapitulate the classic result, (see [5]), consider a 
lattice generated by two complex vectors, 1 and /—2, 


(1.1a) R = (1, V2] = {n+mV—2 | π|πὶε Ζ}. 


Then for this period structure, U an abelian integral (always of the first kind) is defined 
in the period parallelogram, i.e., 


(1.1b) U ε C/R. 
Complex multiplication of R by /—2 produces a sublattice of R. In matrix form, 


(1 ν΄-2), 
2). 


If we were looking for multipliers of norm two, we should find two others: 


(1.2a) R 
(1.2b) (/—2) R 


I 
ἢ 
tw] 
| 
N 
Ι 
το 
κς--.-. 
ιο 

© 


(1.2c) (1+) (17) = (14) (: 7). 
(1 51 ν 33) (? 7). 


1 1 
(1.2d) (5 + V¥—7)) (1 3 ll + ν -- )) 
Of course, we note the above matrices are of determinant 2, and likewise 
. 1 
(1.26) Ν(ν -- 2) ΞΞ Νᾳ +1) = N(5(1 + V¥—-7)) = 2. 


2. The elliptic curve. If an elliptic curve has a two-to-one mapping onto itself (a 
two-isogeny), then its period lattice must be equivalent to itself by a complex multiplier 


107 


mapping this lattice into a sublattice of index two. Again classically, this is an isogeny 


found as follows: 
The elliptic curve is written with (cross-ratio) parameter \(= 82), 


(2.1a) y? = χί(α -- 1)(α -- 52, (s? #0,1). 


We write the parameter \ as a square with the hindsight that its square root s will emerge 
in the computation. There are many equivalent forms of the elliptic curve (2.14). The 
j-invariant, defined as 


(2.1b) j[A] = 256(1 — A + A?)9/A2(A -- 1)?, 


is algebraic in the coefficients of (2.14) and it classifies elliptic curves up to isomorphism. 
Note that 7[λ] is invariant under the transformation group (S3) permuting the points 
{0, 1,00}: 


(2.1) λ-- ME {λ,ι1,λ,1-- λ,( —1)/A,A/(A -- 1),1/(1 -- A} 


If the period parallogram has ratio τ (with $7 > 0), then 7 has the well-known tran- 
scendental expansion in g = exp 2717, 


(2.1d) j(r) = 1/q + 744 + 196884q + 214937609? + --- 


(The direct connection with the periods like r and the coefficients like s* is not easily 
available in the hyperelliptic case which follows). 
The traditional Gauss-Landen transformation, 


(x — 1)(z — 52 
z(1+s)? 
_ y(z? — s*) 
(2.2b) Yo = "r2(s + ἢϑ᾽ 


(2.2a) τὸ = - 


generates the unique correspondence of zp with the pair {:,.521} which extends to 


(2.2c) (to,yo) “> (z,y). 


With the new parameter, 


(2.2d) 80. = 
we verify by substitution, 


(2.2e) ys = ποί(το — 1)(zo — 52). 


108 


Note that we replace zo by 1 — xo then 80 would be replaced by 
] 
(2.24 (1—58)'/? = vs/(—), 


the ratio of geometric-to-arithmetic means of 1 and s. 
Since ὦ = oo corresponds to rp = 00, we see from (2.18) and (2.2de) that 


Zz Zo 
(2.3) t(s + 1) [ dt _ [ ἦτο (modulo periods). 
co fore) Yo 
In terms of the 7-function, we have the two values 
(2.4a) X = j{s?], Y = 7192], 


and on eliminating s and so using (2.2d), we obtain the modular equation of order two, 
(62(X, Y) =) X°4+ Y°-X?Y? + 1488XY(X + Y) — 162000(X? + Y*) + 40773375XY 
(2.4b) + 8748000000(X + ΥἹ — 157464000000000 = 0. 

3. Verification of Gauss-Landen. We return to the substitution (2.2a) to see how it 
might be used as a model for the hyperelliptic case where the abelian differentials (always 
of the first kind) are mapped rather than curves. 

We try to reconstruct the relation (2.2a), starting with 
τ (x~1)(z -- 8ζ)ς 
= . 

This is reasonable since the singularities z € {0,1,00,s*} go into the subset zo € {0, oo}. 
To find the constant c we try a substitution into the abelian differential of the first kind, 
namely 


(3.1a) Zo 


ἄτο Jedz(x -- s)(z + 5) 


(3.1b) i aan D7 Ps = τ ---ΞΞΞΞΞΞΞΞΞΞΞΞΞΞΞ ΞΞΞ-Π - Ξ-ς 
to(zo -- 1)(zo -- 50) x(x -- 1)(z -- 52)41 (x) 45(5) 

where we designate 

(3.1c) qi(z) = ex? —crs? — cr+cs* —z, 

(3.1d) q2(z) = cx* —eczs* —cx+cs* — 521. 


It is now clear that the quadratics g;(z), q2(z) can be conveniently taken as perfect squares 
to make sure the right-hand radical in (3.1b) remains elliptic. Thus 


(3.2a) disc 48 = (cs? —2cs+c+1)(cs?+2cs+e+1) = 0, 
(3.2b) disc 4) = (cs? - 268 - ο Ὁ 52)(ς52 —2cs+ce+s%) = 0. 


There are four ways the two discriminants can vanish, we choose the second factor from 


each of (3.2ab) and solve each for c. Thus a good choice is 
1 82 
(3.2c) c= (+1)? ~ ~(e~1)?? 
and the common value of ὁ also yields the value of so in (2.24). 
In essence we derive the mapping (2.2abd) as one of many mappings for producing a 
relation like (2.3). The parametric relations on the coefficients are only proved sufficient. 
The fact that they cover all cases of degree two ts unimportant here since a corresponding 


result 1s not attempted for the hyperelliptic generalization. 


109 


4. Singular values. The singular cases of (2.2abd) are those discrete values of the 
parameter s where 


(4.1a) 7152] = 7[98], 


so the elliptic curve is the same in (2.18) and (2.26). Now (2.3) leads to 


1 0 
(4.1b) uf dz [ de modulo periods, 

oo Y co Y 
where μ is a complex multiplier (to be determined below). Thus we have two periods of 
the same elliptic curve with ratio equal to the complex multiplier μ. 

When (4.1a) leads to so = +s then μ = 1(s +1). When, because of (2.1c), we use 
λ' = 1—A, or 1 — 82 = 52, then μ = (s + 1) (as the transformation z — 1 — x must be 
used to cancel 52 -- 1 — 52, and this removes the 1). The information is all contained in 
this table: 


equation root value of μ j[s?] 
(4.26) So=-s  s=-1+Vv2 i(s +1) = /—2 8000 
80 ΞΞ 8 s=1t t(s+1)=-1+i2 1728 


88 --1-- 52 s=(-3+/-7)/2 s+1=(-1+/-7)/2 —3375. 


REAL MULTIPLICATION OF DEGREE TWO 


5. Real multiplication. Here we are dealing with a period matrix arising from 
(hyperelliptic) curves of genus two. There are two independent abelian integrals (Ὁ, U’) of 
the first kind with four pairs of periods for four independent paths on the Riemann surface. 
Analogously with §1 (above), there are four pairs of period vectors forming a lattice 


(5.1a) R= [R.1,R.2, Rs, R.4| = {n,R.1 +noRoe+tngkh.3 + n4R.4} (n,; € Z). 


Then U,U’ are defined modulo periods as follows: 


(5.1b) (o) ε C?/R. 


To simplify the lattice R, we select linear combinations of U,U’ using GL(2,C) to 
produce suitable independent abelian integrals and from GL(4, Z) for the periods to reduce 
(5.14) to a standard form showing three parameters p,p’,p” (subject to sign conditions 
deferred to §13), 


_ {10 p ρ' 
(5.10) r= (59 2 4) 


(This is possible since the periods are those of abelian integrals). 
The essence of real multiplication is that another application of S € GL(2,R) on R can 
produce a subbasis (as in (1.2b)), 


(5.2a) SR = RM, M€GL(4,Z). 


110 


This can happen only in special (singular) cases: Assume, e.g., 2p” = p. Then 


_ f0 2\. _ {0 2 2 p\ _ S oO 
(5.2b) s=(‘ 2): 58- (‘ 5. 4) -α( σι}: 


This is called a “real” multiplication because e.,g., the relation 


2 _ (2 0\ _ 
(5.2c) 5. - ( >) = 2E, 


shows S has real eigenvalues +\/2. Also the period vectors of SR form a lattice as in (5.18) 
with the property that 


(5.24) R/SR ~ Z/2Z x 2|22. 


The (absolute) value of the determinant S leads to the characteristic property (5.2d) of 
lattices (and jacobians). 


6. Hyperelliptic curves. The hyperelliptic curve (always of genus two here), has 
three parameters (again written as squares), 


(6.1a) H: νυ = 2(x—1)(z— 52)(α — t?)(z — u’). 


We refer frequently to the “quintic” roots {00,0,1,s7,t*,u?}, or the Weterstrass points, 
(which are assumed distinct, of course). The abelian differentials form a linear space of 
dimension two. Generators are chosen for convenience as dU and dU* with 


(6.1b) U = [ (z= s)dz Ut = [ (z+ s)dz 
οο υ οο y 
We are considering a correspondence of H with Ho, where — 


(6.1c) Ho: ye = πο(το — 1)(z0 — 82)(z0 — t2) (zo — u2), 


We try the mapping 


(6.2a) ee) _ ,(fo= {ee - $8) 


It clearly is analogous to (3.14), but (2.2) fails. For now, we only say 


(6.2b) το ἐπ x, 


from the correspondence of the pairing 


(6.2c) {1:0,52,10} εἰϑ {1,521}. 


111 


We may not, however, write an analog for the correspondence (2.2c); the best result is 
blemished by a + sign: 


(6.2d) (xo,yo) ἘΞ’ (5,3). 


6.3 LEMMA. A curve C of genus g can not have an N-fold cover by a curve C’ of genus 
g’ unless σ΄ > 1+ N(g—1). 


The proof follows from the Riemann-Hurwitz formula 
(6.3a) 2(g'-1) = 2N(g-—1) + W, 


where W is the (nonnegative) number of branchings of the covering. We show some simple 
illustrations when N = 2: 


C’ — C (two-fold cover) 


g g " 

0 0 2 {y=2°} + {y= xX} 

1 0 4 {y2=23-1} — {Y=z2%—-1} 

1 1 0 “Gauss-Landen” 

2 0 6 {y? -- «8 --τὸ + {¥ = αὶ ταὶ 
21 2 {(ν: -- 85 --Ε͵ὉἸ > {y?= χΧϑ -- 1) 
2 2 W<0(?) impossible! 


Yet when we construct the differentials (in §10-11 below) we shall see the correspondence 
of jacobians is in one-to-two fashion (without sign ambiguities). 

The correspondence (6.2a) comes indirectly from Richelot who first broached the gen- 
eralization of the Gauss-Landen transformation. His work is reconstructed in [1] with 
an ingenious formalism. The explicit form of (6.2a) is deducible from it but comes more 
directly from K6nigsberger [12]. In Rohn [19], the two-to-two relation is described, es- 
sentially as a four-to-one mapping, not of curves but of the pair {U,U*} into its image, 
i.e., an isogeny of the two jacobians. We reexamine such matters in the light of Humbert’s 


concepts. 
Our objective for now is to specify c and so (6.28) so as to produce singular real multi- 


plication. This will mean a self-isogenous jacobian. | 
The field determined by the coefficients of H in (6.1a), namely Q(s?,t?, u?) is not large 

enough for the coefficients of Ho, even the field of the parameters of ἢ, 

(6.4a) k = Q(s,t,u), 

requires an extension field, [1 : k] = 2 (see §7 below), so that 

(6.4b) C, So, t2, uz € K. 


7. The transformation parameters. We note that (6.2a) yields two values of x for 
each ro. Rewriting it as 


86 2 s? 2 
(7.1a) ο(το + — -- -- 88) = (rx+— --Ἰ -- 57) 
Το μ" 


112 


we find the conjugates z,z2' satisfy 


(7.1b) τα = ΧΩ 


We work directly (as in 83) with the differential from (6.10), 
dz(z — 5) 


(7.16) dU = 


Our purpose is to transform it into a differential in zo. 


From (6.2a), 
dr.(r2 — 52) ὁ 2_ 2 
(7.2a) Ἕ Zolz6 56) _ ἐς 57) 
0 | 


So, dU may be split into two factors, 


(7.2b) dU = aN A, 
where 
(7.2c) dn = ——270(#0 - s0)ve 


Using the conjugation notation of (7.10), we see z — x! leads to dU — dU', A — A’, 
while ἀΩ = dN’. Then invoking Abel’s symmetry theorem, the sum of differentials 


(7.24) 


dU + dU’ = dX(A+ 4’), 


(7.2e) 


x ; z' 


(τ ᾽ν ς - τ: - αὖ [ὦ γον -- 7 Ξ ud)’ 


should be ἃ differential on Ho in (6.1c) from the mapping (6.2b). 
It is a well-known trick to write the sum of radicals as a radical, e.g., 


(7.3a) (A + A!) = VA2 + Al? 42AA!, 


Consider the internal radical, 


A + A’ 


ax! 


(7.3b) AAl = ————___ ee, 
(z+ s)(2" + 5) (x — ε5]{π’ — #)(2 — u)(a" — uw?) 


Use is now made of the symmetric functions 


(7.3c) ae’ = 85, z+a2' = 9521 610 — 082 —c Ὁ 82/29 (= σι). 


113 


As in (3.1cd), we obtain quadratic expressions which we make into perfect squares, such 


as 


(7.4a) (x — t?)(c' —t?) = ς2 ι΄ - σιεῖ = qi(z0)/zo, 

where 

(7.4b) qi(zo) = —t?ex2 — zo(—s? — t4 +75? + t? — t?ce52 — t?c) — t?es§ 
The discriminant of g;(zo) vanishes, i.e., 

(7.4c) A(t) = (—t?cs2 + 2t?eso — t?¢ +t? — τα — s? + 2495) 

(7.4d) (—t?cs2 — 2t?eso — t?e + t? —t4 — 5? + t?s*) = 0. 


If we worked with the factor (x —u*)(z’—u?) instead, we should obtain similarly A(u) = 0. 


In either case there is the symmetry 80 — —So. 
If we solve for c in the (first) factor (7.4c) from A(t) (as shown), and if we also solve for 


c in the (second) factor (7.4d) from A(u), we obtain two equations 


_ (1+tt+(s—tH(stt) _ (1+ u)(u + I)(s—u)(s +4) 


(7.4e) (So — 1)?82 [--80 — 1)?u? 
The final result is now seen. We need K = k(w) where 
(7.5a) w? = (t? —1)(u? — 1)(s? — t?)(s? — u?). 


Then, comparing the two fractions in (7.4e), we find 
ut? — ut4 — us? + ut?s? + wt 
ut? — ut4 — us? + ut?s? — wt’ 


and by taking the mean of the values in (7.4e), we find 


(7.5b) 80. = - 


(7.5c) c= ΡΈΕΙ 


Finally, AA’ is rational (no square roots), i.e., 
2 


r5S 
7.6a AAl = —————708 
( ) 2 (r0)tuc(zo - So)(zo + So) 
(7.6b) q2(zo) = cx? + 2510 + 8710 + τὸ — οδὖσο — cX0 + €86, 


It would seem purely “routine” as a problem in computer algebra (e.g., MAPLE V) to 
find the square root of A?+.A’"+2AA’ but unfortunately the field of K = k(w) presents too 
many difficulties because of the four parameters s,t,u,w. The fact that w is a dependent 
parameter makes it even harder! (The persistant error message is “object too long”). 

We could resort to the parallel formalism of Richelot to find expressions for ἐδ and 
u2, using the four parameters, s,t,u,w (and an additional radical), but there is no exact 
analogue of the Gauss-Landen transformation (2.2ab). 

We do not show this work here, as our purpose is to specialize such results to the singular 
case only. It will be characterized by 8 = 89 although there are many equivalent ways, 
e.g., 8 = 1/80, 8 = to, etc. (Recall that we only claim a sufficient condition for real 


multiplication.) 


114 


HUMBERT’S CRITERION 


8. Humbert’s equations. We shall derive Humbert’s criterion [10] as a sufficient 
condition for real multiplication of degree two. Humbert showed it is also necessary, (by 


stronger techniques using the Kummer surface and the theta-function). 
We start with the condition that s = so. Then eliminating w in (7.5ab), we obtain the 


(8.1a) resultant = (t? — 1)(s? — t?)hohg (= 0). 


Here ho = Ὁ and hz = O are two of these four equations generated under changes 8 — 
—s,t— —tju— —u: 


(8.1b) ho = s*t—s*u—st—su+t?sut+ t?u + su7t — ut, 
(8.1c) hy = s*t—s?u+st+ su—t?su+t?u — su7t — u’t, 
(8.1d) ho = —s*t—s*u+st —su+t?su+t?u—su*t+u7t, 
(8.1e) hg = s*t+s7u+st —sut+t?su —t?u — su7t — u’t. 


Thus, if we are dealing with the actual coefficients (s?,t?, u?), we should have to multiply 
all four and get Humbert’s equation, 


H(s?,t?,u?) = μοι} = 28°u®t? + 1257utt? — utt?st — uts4 
— u8t4 — ¢458 — 58u4 + 2u5t% + 245° + 25° u4 — stuBe4 
— 16u4t%s4 - 2u4t8s? + 12u%t%s? + 12u7t%s4 + 25472? 
(8.1f) + 2s7u7t® + 257uSt? — 16s4u2t4 + 128°u?t? — 16s*u4t? 
+ 40u4t4s4 + 2u5t%s4 — 16u4t%s? + 2u7t%s® + 12u4t4s® 
-- 16s7u%t* + 257u8t4 + 1284uSt? - 16t4s°u? — 16u°t*s4 
-- 16s°utt? + 2t7s%u? — t4s* — υἷε (= 0). 
Curiously, Humbert failed to use square parameters, so his equation did not have this 
factorization. It was written (essentially) as homogeneous of degree eight. 
To see the symmetries involved, number the branch points of the hyperelliptic curve 
(6.1a), and introduce v(= 1) for homogeniety. (Thus s > s/v,t — t/v,u — u/v in (8.1b-f) 


with the ho,--- ,h3 and H replaced by their numerators v4ho,-:- ,v*hg and v'°H). Then 
the (coefficient) parameters are | 


“quintic” roots OQ oo vu 


(8.2a) vertices 12 3 4 5 6 


Accordingly, the equations (8.1bcde) and (8.1) are invariant under the dihedral group (of 
order eight): 


(8.2b) D4 = < (3456), (35) >. 


Therefore, with 4!/8 = 3 equations conjugate to (8.1f), and 15 ways of selecting the two 
points {1,2} for 0 and oo, there is a total of 45 equations of degree 8 in the quintic 


115 


coefficients which constitutes Humbert’s criterion. The process of defining the manifold 
of real multiplications of degree two itself is far from easy. (For the total manifold of 


hyperelliptic curves, see [11)). 
A simplified version of the factors of Humbert’s equation, can be given with the help of 


the symbol 


a? 


_ p 
3 = . 
(8.3a) jas] = “τ 
Then with v(= 1 for homogeniety), we can see 
(8.3b) ho stuv([ts] + [su] + [uv] — [vt]), 
(8.3c) = uv(t? — 52) + vt(s? — u?) + ts(u? — v?) — su(v? — t?), 


and (ignoring sign), h2,h3,h; all come from ho by the cyclic permutation of (v,t,s,u), 
from (8.2b). 
Finally, from ho = 0, we obtain a rational parametrization, using 


(8.4a) | r = ut, 
then with only two parameters r and s, we express all coefficients of the quintic, i.e., 


r—rs+s*+s 
r+rst+s*—s 
rt+trs+s*—s 


8.4 2 = yp (= 
(8.40) Ν τ τποτοῖςι a2); 


(8.4b) P= (= a), 


and the singular quintic (from (6.1a)) now becomes 
(8.4d) y? = x(x —1)(x — s*)(x — a;)(x — a2). 


9. Humbert’s geometric criterion. From the singularities of Kummer surfaces, 
Humbert was able to express his criterion as a property of conic sections. 


9.1 HUMBERT’S CONIC CONSTRUCTION. Take an arbitrary (nondegenerate) conic C in 
the x,y,z projective plane and parametrize it in any way, e.g., 


(9.1a) C: z:z2:y = do(w): φι(υ) : do(w), 


where ¢;(w) represent suitable polynomials of degree at most two. Then let {1,2,3,4,5, 6} 
represent the points found by substituting for w the values from the table (8.2a), (with 
v = 1). Construct the tangents at all six points. The tangents at 3,4,5,6 (in that order) 
form a quadrilateral by the successive intersections, 34, 45,56,63, as in Figure 1. 

Then the condition for real multiplication of degree two is that another conic C’ exist 
which has a common tangent with C from points 1 and 2 and which passes through the 
vertices of the quadrilateral 34, 45, 56, 63. 


The details were omitted by Humbert, but we can make a straightforward computation 
to verify that the conic construction is equivalent to Humbert’s equation (8.1f). 


116 


Figure 1, mumbert’s Criterion 


Actually, all conics have (in C) parametrizations projectively equivalent with each other, 
hence with the simplest: 


(9.1b) C: y= w*, r=w, (z = 1), 
(in affine coordinates). We note that on C, 
(9.1c) {tangent line at (a,a”)} = {y+a* = 2za}. 


The intersection (x,y) of the tangents at (σι, 97) and (95, 92) is 


σι + 
(9.1d) {(ν τοὶ = 2.9.} ON {y+g9? = 2zgo} = (A= 9192). 


Therefore we look for the second conic Οἱ having the common tangent to C' at oo and 
O. These two conditions leave three parameters in 


117 


(9.2a) C': (q(z,y) =) 274+ 2Bry + By? +2Ar+ A? +2Gy = 0. 
The parameters A, B,G result from the common tangencies as follows: 
(9.2b) (x,y) + 00 => g(x,y) » (r+ By)?, y>0 => g(z,y) » (z+ A)’. 


We now have four equations for the three unknowns A, B,G: 


14%? t? + 5? 
(44 =) (#7) = 0, (as =) a(S, #75?) = 0 
2 2 2 
s“+u u~ +1 
(9.2c) (456 =) a(—,— »8°u") = 0, (ges =) 9( 9 ju’) = 0. 


We first eliminate G, (which is linear), between the three successive pairs of the equations 
934,945,956, 963, 80 434, 445 Tesult in (say) 4545; etc. 

We thereby obtain a trio of equations, 9345, 9456, 9563, Nonlinear in A and B. Now we 
have a choice of which of A or B to eliminate next. 

So we eliminate A between the two successive pairs and 4345, 4456 result in (say) 43456. 
etc. We now have the two equations 93456, 94563. The remaining variable, namely B, is 
then eliminated from this one last pair to obtain the (partial) “resultant-GAB.” 

We next eliminate B first (before A) so we obtain the corresponding (partial) “resultant- 
GBA.” 

It turns out that each of the partial resultants has extraneous factors, which are elimi- 
nated by comparison. Thus to within a numerical factor, 


(9.2d) gcd(resultant-GAB,resultant-GBA) = H(s’,t?,u?). 


INTEGRAL IDENTITIES 
10. Transformation of abelian integrals. We complete the identification of the 
singular case (8.4bcd) by showing the two-to-two mapping of each of the hyperelliptic 
curves H and Ho of §6 and the mapping of the abelian integrals. 
Now with s = so, the computer algebra simplifies sufficiently. We state the results and 
sketch the computations in §11. Starting again with a relation of type (6.2a), 


—] _. 92 —] _ e2 
(10.1a) (z= 1)(z—s") _ (Go = 1)(t0 ~ 5) 
Ζ Zo 
and we look for a constant to provide the self-mapping of the abelian integrals. We shall 
show in the next section 


(r? — s?)(r + s?)(r — 1) 


10.1b = . 
( ) . r(r—rs+s?+s)(r+rs+s? -- 8) 


118 


We shall also see that the elliptic curves are 


(10.1c) H: νυ a(x — 1)(z — s*)(x — αι)ία — aa), 
(10.1d) Ho ° y2 = ro(xo - 1)(1:ο - s”) (xo - δι)(το - be), 


where a) and α are given in (8.4bc) and δι and bo are derived by the involution 
(10.1e) r - s*(1—r)/(r+s7), s > 8, 


which transforms 
(10.1f) αι -Ὁ 


We further note from (10.148), for given zo (or x) that two conjugate values of z (resp. 
Zo) are uniquely determined by 


(10.2) rz’ = 52, ror’ = 82, 
while at x’ (or zo’) the corresponding values y’ (resp. yo’) is determined with ambiguity 
of sign. : 


10.3 MAIN IDENTITY. The abelian differentials at the conjugate points add as follows: 


(10.3a) dzo(zo — s)/yo + dzo0'(z0' — s)/yo’ = γι: ἀτί: —a1)/y, 
(10.3b) ἀτοίτο — s)/Yo — dz0/(x0’ — s)/yo’ = γι dz'(z’ — ay)/y’, 
(10.3c) τς dzo(to + 8)/yo + dxo'(r0' + 5) }νο΄ 2 dz(xz — a2)/y, 
(10.3d) ἀτοίτο + s)/yo — dzo'(z0' + s)/yo' y2 ἀπ' (4! — a2)/y’. 


+ 


The constants satisfy 


(10s) og = eet “π΄ ers) 4 - ales) 

(r —rs+s* + s)rs?(r -- 1) rs*(r -- 1) 
Note that the relation between (10.340) and (10.3cd) is 8 —+ —s, leading to (αι, α2) — 
(a2,a1) andy — 12. 

We must consider the ambiguity of signs. First of all, 1. has an arbitrary sign, but it 
determines +2. Furthermore yo and yo’ can also be chosen arbitrarily, but then yy’ has a 
sign determined by the product of (10.3a) and (10.3b) (which cancels the ambiguity of yo 
and yo’). The signs are consistent for (10.3cd). 

We define the jacobians of H and Ho on each Riemann Surface (modulo periods of 
course) as column vectors with components consisting of abelian integrals with upper 
limits P = (z,y), Po = (:ο, νο), etc., and some convenient lower limit, say oo. Thus 


(10.4a) » » 
H: u(P) = n | dz(z — a;)/y, v(P) =n | dx(x — a2)/y, 


119 


(10.4b) 
Po Po 
Ho: uo(Po) = [ dzo(zo — s)/Yyo, vo(Po) = [ dxo(Zo + s)/Yo. 


Thus the jacobians are the column vectors 

(10.4c) | H: J(P) = (u(P),v(P))", 

(10.4d) Ho : Jo(Po) (u(Po), v(Po))*”. 

Note that a change of (say) (z,y) — (z,—y) willcauseu — των + -v, J(P) -- 
—J(P). 

10.5 MAPPING OF JACOBIANS. Theorem (10.3) may be rewritten with (10.3ac) and 
(10.3bd) grouped as follows: 


(10.5a) Jo(z0,y0) + Jo(zo0', yo’) = J(z,y), 

(10.5b) | Jo(Z0, Yo) + υὑο(το΄, --νο = J(zr',y’). 
The Jacobi inversion theorem leads to 

(10.5c) 


(u(z,y),v(z,y)) — {(uo(zo, Yo), vo(o; Yo)), (uo(z0, 40), vo(Zo; Yo))}; 


(10.54) 
(u(z’, y’), υ(τ', y’)) ΜΗ {(uo(Zo, Yo), vo (zo, yo))s (uo(zo, Yo) vo (zo, Yo))}- 


In jacobian notation, more simply, 
(10.5e) J(P) — {Jo(Po), Jo(Po)}; 


(10.5f) J(P') — {Jo(Po), -Jo(Po)}- 
Thus the gacobians of H and Ho have a one-to-two correspondence that the curves could 
not have (recall §6). 


11. Proof of the main identity. The steps follow the pattern of §7 (with s = so), so 
a brief sketch might suffice. From (10.1a), (with « as yet unknown), 


(11.1a) Kdxo(x2 — 52,12 = ἀχ(α3 — s*)/zx?. 


Together with the internal substitution of (10.1a) into (10.1c), this gives us 


dz(x? — 52) 


lb left hand 3ab) -- ------------- --- - 
(11.1b) t (10.3ab) TaJale cea) 


(A+ A’), 


ἢ 
/ Zo 


‘Ic A= 70 Ξσσ----: --Ξ-ΞΞΞ-ΞΞΞΞ---- --. 
(11.1c) (xo + s),/(zo — bi) (zo — ba) A (το! + 5) / (zo! — 61) (20! — 62) 


What remains is a smaller computer algebra exercise than in §7 (fewer parameters): We 
compute (AA’)? and A? + A’ as rational functions of z using (10.1a), e.g., 


120 


(11.2a) Toto’ = $*, Io +X’ = 14+87+ (x —1)(x — 5*)/(zK). 


As in §7, AA’ is rational when (zo — 6,)(zo’ — 61) is a perfect square. The discriminant 
condition (as in (7.4cd)) gives the value of « in (10.1b). We have two values +AA’, so we 
are simultaneously computing both of (10.3ab), 


(11.2b) A+tA! = VA2+4A!? Ἐ2ΑΑ'. 


The radicand becomes a perfect square again, and this gives the differential shown in 


(10.3ab). 
Finally, (10.3cd) follows from the substitution s > —s. 


12. Self-mapping of jacobians. The mapping of jacobians (10.5) from Ho(zo0, yo) 
to H(z,y) can be recast as a mapping from H(z,y) to itself (actually H(z,w)) by the 


biunique change of variables 


(12.1a) Zo = (z—1)s?/(z—s?), 
(12.1b) Yo = w(s + 1)(s — 1)?s?/(z— 8”)? 4, 


so the equation (10.1d) for Hp becomes that of H again, i.e., 
(12.1c) H: υ = 2(z—1)(z—s?)(z—a,)(z— ay). 
The transformation (12.1a) comes from (10.1a) with z and z symmetrically disposed: 


(x — 1)(x — 52) _(z-Ij(z- 52) 


(12.1d) = «(s? —1)?. 
Here, « comes from (10.1b) and 
((r* ~ s*)(r + 82) — ἡ 

(r+rs+s?—s)(r —rs+s?+s)(s +1)s?° 

The abelian differentials on the left in (10.3a-d) can be transformed back to H(z,w) by 
means of 
(12.2a) ἀχοίτο — s)/yo = Yodz(z+s)/w. 
At the same time the differentials on the right can be transformed to the “standard” ones 
in (10.4a) by 
(12.2b) (cx -- αὶ = (t+ s)a+(x—s)f, (c—az) = (: -- 5)α" - (1 -- 5)β", 


(12.1e) w= - 


(12.2c) a = (s—a,)/2s, B = (s+a,)/2s, 


(with the operator * denoting the involution s — —s). 
The transformation (10.5ab) of jacobians (as column vectors) now becomes 


(12.3a) J(z,w) + J(z’,w’) = MJ(z,y), 


121 


(12.3b) J(z,w) + J(z',-w’) = MJ(z',y’). 
Here M is the matrix 
γα "fp 
(12.3c) M = (7 γα" ’ 
with 
(12.34) "2 = (s+1)(r+rs +s? —s)? 
(5 —1)r(r —1)(r + 52) ’ 


(r +rs+s? —s)(r—rs +s? +5) 
r(r —1)(r + 52) 
It can then be seen (using the choice of signs from (12.3de)), that M had determinant 


—1 and trace zero. Hence its eigenvalues are +1. 
We now choose new jacobian coordinates to follow the eigenvectors, (say) 


(12.4a) j(z,w) = (f dz(€412 + €12)/w, [ aelen2 + ἐμ) ω)", 


(12.3e) yy = 


and likewise (with the same eigenvectors) for j(z, y). 
With this choice of jacobian (12.3ab) becomes 


(12.4c) j(z,w) + j(z’,w’) = Ty(z,y), 
(12.4d) j(z,w) + g(z’,-w’) = Ty(2',y’), 


(12.4e) T = (: 5): 


As before, (12.4c) and (12.44) are equivalent (only one need be used). 
The involution T and the correspondence of (z, w) with (z2’, w’) justify the kernel Z/2Z x 


Z/2Z in the isogeny of the jacobian j(z, w). 


APPENDIX: THE HILBERT MODULAR INVARIANTS 
13. Hecke’s parameters. The period matrix FR in (5.1c) must satisfy the sign condi- 
tion 


(13.1a) JpSp" — (Sp’)? > 0, 


when the periods are those of abelian integrals. A period matrix with such sign conditions 
is called a Riemann matriz. We had the further restriction to produce real mutiplication 


(13.1b) p = 2p". 
Hecke [7] introduced new variables 


(13.1c) (0/2 =) ρ" = (τ -- τῇ 2 ν2, ρ' = -(τ τ τΊΖ2, 


for which the sign condition (13.1a) is satisfied by 


122 


(13.14) Sr > 0 > Sr’. 


In (5.1c) we now replace R by SR, (where the matrix S denotes a linear combination of 
the abelian integrals), 


| _ [1 -Vv2 
(13.1e) S = ( /2 ) . 
Then the Riemann matrix RF is rewritten as 


(13.2a) r= (1 YZ, 21). 


1 V2 --νῶτ' -τ' 


There are (again) four periods shown as column vectors for the two abelian integrals 
represented by the rows. There is an obvious equivalence under change of variables through 
multiplication by GL2(C) on the left and unimodular change of period basis through 
multiplication by GL2(Z) on the right, i-e., 


(13.2b) Rx GL2(C)-R-SL,4(Z). 


The real multiplier ring isomorphic to Ζίν 2) is put in evidence (again) by the real action 
on the variables: 


(13.2c) (τ 4). = ( nore ve): 


The right-hand matrix columns clearly show only a subgroup of the periods of (13.2b) of 
index 4, i.e., of structure Z/2Z x Z/2Z as in (5.2d). 


14. The Hilbert modular group. Generically, this is the group G acting on the 
pair (τ, τ΄) which preserves the equivalence classes of the Riemann matrices with real 


multiplication. 
In our context, first there is a symmetric subgroup σ΄ of index 2, 


ar +B , a'r’ + β' 
, τ -ἰπ--------- 
a7 Ὁ ὃ a'r! + δ' 


(14.1a) α΄": 


Here α,β,Ύ, δ lie in Ζί(ν 2), and αἱ, β΄, γ', δ' are the conjugates under /2 > —/2. The 
determinants also must satisfy | 
(14.1b) αὖ —-By = 4, α’'δ' — Bly’ =X, 


for A a totally positive unit, actually 0 = (1+ +/2)?™,(m € Z). (In this case, it suffices to 
take λ = A! = 1 by using a factor of 172 on a, B,7, 6, — likewise for the conjugates.) 
The full Hilbert modular group has the alternating operation 


(14.1c) A: τ - -τ', τ' - -τἢΧ 


SoG = < G*,A>. This group preserves the half-planes $7 > 0 > Sr’. 


123 


To see the action on the Riemann matrices (as in Hecke [7]), first let the element of G* 
act so R -- G*(R). Then we remove the denominators by row multiplications, 


T+6 0 ‘ ‘ 
(14.2a) ( 0 yes) -G*(R) = R’, 
where, 


(14.2b) Rt = ( + 6) —J2(y7r + 6) V2(ar + B) —(ar + βὲὶ ) 


Ὑ᾽τ' 4 δ᾽) ν2(γτ' + δ) —V/2(a'r! + B') —(a'r! 4 B') 
Using integral components {ag,--- ,d,} such that 
(14.2c) α Ξ- αο τ αχν, B=bo + ὀϊν2, y=co+ οινΝ2,ὃ τε ἀρ + d, V2, 


we can rewrite R* in terms of rational integers, 

dg -—2d, 2b, —bo 
--ὦ do —bo by 

61 —Co ao ταὶ 
—Co 261 —2a, ao 


(14.2d) R* = R-M, where M = 


Since it can be verified that 
(14.2e) determinant(M) = (a6 — βη)(α' δ' — β'γ) = 1, 


we see the substitutions of G* preserves the equivalence class of R. 
It is a trivial matter to see that the alternating substitution 


(14.3) A: τ-ῦ--τ', τ' -α τ, V2>-Vv2 


simply interchanges the rows of R. So the Hilbert modular group G =< G*,A > isa 
group preserving the equivalence classes of FR. 


15. The modular function field. The Hilbert modular function field for Ο(ν 2) is 
the analogue of Q(j(7)) for the Klein modular group acting on τ. 
_ We start with the symmetric subfield Q(zx,y) (see [6]). There 


(15.1a) (z =) σίτ, τ = H3/Hs, (y=) νίτ,τ΄ = HoHa/He, 
and H; denotes special modular forms of (even) weight k, e.g., 

(15.1b) G*: Ηκί(ί(ατ + B)/(y7 + 6)) = ((γτ + δ)(γτ' + 6'))*¥ Η(τ, 7’). 
The symmetry is derived from 

(15.1c) A: Hy(1,7') = Hy(-1',—7). 


The selection of the forms H; of weight k is made so as to have simplest infinite behavior. 
In effect, we have the following diagonals: 


a 
3 
+ 
S$ 
i 


(15.2a) D;: τ (1 -- V2) = wy, 
(15.2b) 2:1 = -τ' = wy. 


124 


Then H2, H4,H6 are chosen uniquely as the modular forms of indicated weight for which 
the diagonal behavior is like the Klein- Weber modular function j(w2) on D2 and like the 
Hecke \/2-modular function J2(wi) on Dy. (Recall j2(wi) = jo(—1/w1) = jo(wi + V2).) 
Thus (see [6] or [4]) | 

(15.2c) D,:1/z = H4/H? = j2(wi), He(wi) = 0, 

(15.2d) D,:1/(zy) = He/H? = 7(ω2), Ha(we) = Ο. 


The field Q(z, y) is characterized by the symmetry (of r «Ὁ --τ', τ' ++ —r). The ertended 
or alternating field is Q(z,y,a) with a determined by choice of sign in 


1 
(15.3a) (a? =) a(z,y)? = 5 ly + 4)(xz*y — 1728: — 288zy — 1024y? + 4:υ3), 


according to work of Gundlach [6] and Nagaoka [16]. Here 
(15.3b) a(x(r,r'),y(7,7')) = --α(τ(--τ', --τ), ν(--τ', —7)). 
(The factor of + will be needed to avoid irrationalities later on). 


16. Explicit two-isogenies. 

Three 2-isogenies come from τ + (2+ /2)r7,r' --- (2—/2)r', (note the multipliers 2+ /2 
have norm 2 and preserve the signs). Actually, the three mappings arise concurrently by 
coset properties in the Hilbert modular group 


((2 + V2)r, (2 - V2)r’)) 
(16.1) (τ,τῇ > ¢ (r/(2 + V2), 7'/(2 — V2) (z,y,a)? 4° (X,Y, A). 
((r + 1)/(2 + V2), (r' + 1)/(2 -- V2) 


This is analogous to the triple τ — {27,7/2, (1+ 7)/2} for the ordinary (Klein) modular 
group, which produces three conjugates 7(τ) — {7(27),7(7/2),9((1 + 7)/2)}. 

For simplicity, we begin by ignoring the mapping a — A, leaving only (z,y) — (X,Y) 
(see Cohn [4]). We then start with a two-valued correspondence (to be resolved later by 
the choice of sign of a). 

Given (z, y) the symmetrized (X,Y) are determined by an incomplete intersection with 
four cylindrical equations of co-dimension 2, (note the degrees of the variables): 


(16.2a) 
3 
fi(z,y;X) = χα + (432 + 156y — zy) X22 + 2(zy + 144y — 1728)? 
+ (τῆν + 207x7y + 1152y?z + 19008yzr + 622082 + 82944y”) XK = 0, 
(16.2b) 
(fe =) fi(X, Y; 2) = 0, 
(16.2c) 
23 3 
fa(z,y;Y) = Y°y?(y +4) -- (x -- ὧν + 48)y2¥? + (—5y + 108)zyY 
+ z(zy + 144y — 1728) = 0, 


125 


(16.24) 
(fs =) fo(X,Yiy) = 0. 


The symmetry of (z,y) — (X,Y) is reminiscent of the symmetry of z = j(w) and X = 
j(2w) in (zr, X) = 0, the (cubic) Klein modular equation of order 2. 


17. Uniqueness problem. As we see from the three 2-isogenies, for every (x,y) there 
should be only three pairings of (X,Y) with (z,y) (not nine from the determination of 
three values of X from f; = O and three values of Y independently from f3 = 0). The 
equations f2 = 0 or f, = Ο (generically) resolve the three legitimate pairings (X,Y). 

If we take discriminants of the cubics f; and f3 separately, 


(17.1a) X-disc(f,;) = 4zy?(xz?y — 1728: — 288: — 1024y? + 4zy?) 
- (337527 — 746496y + 10368ry + 25227y + z°y)’, 
(17.1b) Y-disc(f3) = 4(x?y — 1728x — 288zy — 1024y” + 4ry”) 


- (432 — 72y + ry + 3y?)?y*z. 


The fields Q(z,y, X) and Q(z,y, Y) are the same as we show next, but for now we may 
note the discriminants agree to within square factors! 


18. Algorithm for identifying field generators. We can explicitly link Y, the root 
of f3, with X, the root of f; by a construction using the following: 


18.1 MAIN ALGORITHM. Let K/k be an extension field of degree n defined by & where 
(18.1a) (A(é) =) An€™ + An—-1€""* ++--+ Ao = 0, (4; ©). 

Let some other element n be defined by the equation 

(18.1b) (B(n) =) Ban® + Bn-in””'+---+ Bo = 0, (B; Ek). 


It is required to discover if n lies in K, and if so to express ἢ in terms of ξ as a polynomial 
of degree n — 1, 


(18.1c) n = Cn-1€"' +-++++Co (= C(€)), (C; Ε κ). 


We look for the n — 1 unknown coefficients C;. Note the converse process, finding Β(η) 
given A(€) and C(6), is taken care of by 


(18.1d) B(n) = €-resultant of A(€) and n — C(E). 


So even with C(é) unknown, the €-resultant of A(é) and ἡ — C(€) is a polynomial 
R(n) of degree n whose coefficients are to be proportional to B(n). By comparing the 


126 


coefficients of R(n) and B(n) we have πὶ — 1 equations Τὶ = 0 fort = 1,--- ,n — 1 (in the 
n — 1 unknowns C; and the knowns A,, B;). Here the Τὶ are of degree 7 in the unknowns 
C;. | 

Then the unknowns C; are found by the use of resultants for successive elimination. 

This process is tolerably easy where n = 3. We restrict further remarks to this case for 
simplicity. For instance, Τὶ is linear, so Co can be solved in terms of Οὐ, and C2. Then 72 
is quadratic and 73 is cubic in the unknowns C, and C2. So the C,-resultant is a sixth 
degree polynomial in C2, which we factor over k. In fact, (note the degrees), 


| 6 6 9 9 22 2 838 6 12 
(18.1e) C';-resultant(T>, T3) = R(C2; Ao, A1, A2, As, Bo, Bi, Bz, Bs) = 0 (n = 3). 


If a solution (Co, C1, C2) does exist, there must be a linear factor giving C2. Going back 
we have a linear factor for C in T;, and a formula for (Ὁ. completing the solution. 

For efficiency, it is desirable to store the resultants and equations 7;. For instance, for 
n = 3, we would solve the linear Τὶ to eliminate (say) Co in Tz and 73. So we store Τ᾽, 
T2, and the C-resultant of Τὴ and T3, (a polynomial of degree 6 in C2). 


18.2 EXTRANEOUS SOLUTIONS. This process will always produce extraneous solutions. 
For instance Τὰ is quadratic so it has two roots if it has one. Actually, in the case of a 
normal equation, each of the n roots of B(n) are expressed by some C(€), but we still have 
n extraneous solutions. So verification of C(n) is still necessary. 


18.3 MONIC EQUATIONS. If we consider the fact that the degrees of A3 and Bg are 
exceptionally high in the C,-resolvent (even for n = 3), it is advantageous to change 
variables before executing the algorithm so as to make € and ἡ monic, (e.g., ξ > An€ and 
7 — Ban, so An ~ 1,By — 1). In practice this helps avoid overflow. 


19. Improved parameters. With this algorithm (using k = Q(z,y)) we determine 
X € Q(z,y,Y) = K from f3(= A(Y)) and f,;(= B(X)) as 


(19.1a) 
(X =) X(Y;z,y) = [—(yx — 1728 + 144y)(y? + ἂν -- 36) /dx | 
+ Y[(z?y + 21602 — 360yx — 20736y — 81τυ + 3456y? + 4y%x — 144y*)y]/dx 
+ Y7|(yx + 3x + 432 — 36y)(y + 4)y?]/dx, 
(19.1b) 


dx = (yx+ 432 — 72y + 3y”)z. 


At the expense of much more complicated relations, we now use the single-valued nota- 
tion, in which the extended modular function field is birationally parametrized as 


(19.2a) Q(z,y,a) = Q(s,y) 


by introducing the new parameter 


127 


a+4(y + 4)(35y + 108) 


19.2 = — 
(19.2b) (s =) s(a, z, y) x + 200y + 864 


Conversely, r = z,(s,y) and a = a,(s,y) where 


(19.3a) 
τιον) = 16(25y + 108)s? — 16(y + 4)(35y + 108)s + 4(y + 4)(49y? + 288y + 432) 
81.5.0.) = y(y + 4) -- 252 ? 
(19.3b) 7 
~—8(y + 4)(35y + 108)s? + 36(1ly + 12)(y + 4)2s — 4y(y + 4)?(35y + 108) 
a,(s,y) ---- Ee, 
y(y + 4) -- 25 


The derivation (see Miller [15]) comes from the interpretation of the equation (15.3a) 
for a? as a conic in a, x over Q(y). Then by the “method of Pythagoras,” all we need for 
parametrization is a rational point in Q(y), namely 
(19.4a) ag = —4(y+4)(35y +108), zo = —200y — 864. 

Then the slope is the new parameter 


(19.4b) s = (a—ao)/(z — Zo). 


Now, the multiplication on (τ, τ) by (2+ 2,2 — V2) (and conjugates) leads to a three- 
valued mapping of 


(19.4c) (x,y, a, s)* —* (X,Y, A, 51. 


In terms of the one-valued parametrization we want just the mapping 
(19.4d) (s,y)' +3 (S,Y). 


We pay for the “better” parametrization by more complicated relations. The cubic (5 
which defines Y now becomes 


3 
(19.5a) fae(Yi8,y) = Ys¥2+¥oY?+ViY+Yo = 0, 


with the more formidable coefficients 


(19.5b) 
Y3 = y?(y+4)(y? + ἀν — 2s?)?, 


128 


(19.5c) 
Y2 = —8y?(y + 4)(y? + ἀν — 2s”) (24y? + 150y + 216 — 70sy — 216s + 5152), 


(19.5d) | 
Y; = —4y(5y — 108)(y? + ἀν — 252) | 
(49y> — 140sy? + 484y? + 1584y — 992sy + 100s?y — 1728s + 1728 + 43257), 
(19.5e) 
Yo = 16(49y4 + 520y® — 140y?s + 1296y? + 100y?s? — 992sy? — 1728sy 
+ 360s7y + 8645”) (49y? — 140sy? + 484y? + 1584y — 992sy 
+ 100s?y — 1728s + 1728 + 43287). 


Note that the equation for X(Y;z,y) can now be rewritten by substitution as X,(Y;s,y), 
with similar loss of simplicity. | 


Given 8 and y, for each of these three (generic) values of Y there must be one S(s,y;Y), 
so we have the three-valued correspondence (5, y)’ +? (S,Y) precisely formulated (gener- 
ically). This is a much larger computation. The “total” equations are too long to be 
manageable (or even instructive), but we outline the component steps. 


First, we define p as the ratio A/a = a(X,Y)/a(z,y), ie., 


(Y + 4)(X?Y — 1728X — 288XY — 1024Y? + 4XY7?) 


19.6 a = 
(19.6a) p (y + 4)(x?y — 1728: — 288ry — 1024y? + 4:53) 


If we substitute X = X(Y;z,y) then the Y-resultant of the above equation with f3 yields 
a cubic in p*. This equation factors into two cubics (trivially equivalent under p + —p): 


(19.6b) 
Ὁ =po + pip+p2p” + psp*, where 
(19.6c) 
Po = (zy — 1728 + 144y)(x?y? + 2x7y — 904zy? — 9920zry — 16000z + 165888y"), 
(19.6d) 


ρι = y(—746496y* + 2υ 31 — 1055y3x? + 9289728y? — 7741440y%, 
— 4032zy4 + 827y* — 140082r7y? — 3365632zy" — 213184y%r 
— 9040896z — 23280xr7y — 82944002 + 42%y?), 


(19.6d) 
p2 = — y*(—2zy + 480 + 464y + 54y? — zy”)(x?y — 17282 — 288ry — 1024y” + 4zy’), 
(19.6e) 
_ ps = 8 (y + 4)? (2?y — 1128: — 288zry — 1024y? + 4zy’). 


We note (with some relief!) that no radicals (like /2) were needed for factorization leading 
to this equation! 


129 


By applying the Main Algorithm again, we express p directly in Q(Y,z,y) as 


19.7a 
e =) p(Y3z,y) = [(yx — 1728 + 144y) (4y32 — δ16υ + εἶν" — 1742y? 
+ 6912y? + οὖν + 576yz + 4320z)|/d, 
— Y[2y(—1488y4 + 8ry4 + 2y32? — 643y%x + 38976y> — 292608y" 
~ 5iz7y? + 82522y? + 470016y — 108z7y — 11664yzr — 139968z)]|/d, 
— Y7[y?(x?y? + 227y — 4322 — 235008 — 760yx — 12672y — 211zy” 
+ 2400y? + 4y%x — 744y%)]/d,, 
(19.7b) 
ἀ, =y(x?y — 1728. — 288yx — 1024y? + 4xy”)(yx + 432 — Τὴν + 35). 


The s, y parameters can now be made basic. Since p € Q(Y,z,y) C Q(s, Y,y), the same 
can now be said for A = ap and finally for 


A+4(Y +4)(35Y + 108) 
X + 200Y + 864 
(In fact we use a = a,(s,y),X = X(Y;z,y) and z = x,(s,y) to obtain S as the ratio of 


two quadratics in Y with coefficients in Q[s, y]). 
The “final” result, by another use of the algorithm, is 


(19.8a) (S =) S(Y;s,y) = 


(19.8b) , S = So(s, y) + Y Sy (s, y) + Y*S. (s, y), 


which would, unfortunately, involve unmanageable polynomials of Q(s,y) in the rational 
coefficients S;(s,y). We content ourselves with the implicit result. 


20. Fixed points as illustrations.The fixed points of the system (16.2a-d), where 
(z,y) = (X,Y) (see Cohn [4]), lead to the following table: 


z=X y=Y a= A A/a (= p) s = 
576 12 +3072 1 48/5 
” ” —3072 1 8 
ἩΤΕΥΙΤ Ξϑ3- ὅν 17 Sq (FA) 94 -- V17)° 1 8-- 24 119 2 34 
” ” —~ (ΞῈ 17)16(4 _ V17)° 1 8-- 24 1τ 2--, 2ἡν., 34 
{π| Ξϑ ΒΨ ΙΤ ὦ (Ξ.57)15(4 + V17)9 1 8+24 11+ 2--ν 34 
” og — $5 (3997) 18(4 + V17)9 1 8Έ24 17 Θυ 2 Ἐν 84 
—64 —4 0 —1 


There are generally two values for s(= S) corresponding to the two roots ta(z,y). Note 
the role of conjugates in Q(Vv2, ν 17). Also note in the last case, that the locus a = A =0 
can be a point of indeterminacy for some parametrizations. 


130 


21. Remark on algebraic invariants. Something sorely lacking in this exposition 
is explicit algebraic expressions for the Hilbert modular invariants, (e.g., x,y, a, 8, etc.,) 
in terms of coefficients of the hyperelliptic equation, analogously with classical elliptic 


formulas for 7, such as 


1728 - 4A? 


2 _ 3 - 
(21.1a) Ww = 2 +Az+B - J = 4A3 + 27B2' 


Thus 7 is seen directly as an elliptic curve parameter, e.g., 


—277 
21.1b B= ———,, A = ¢tB (t arbi . 
(21.1 ) 4G - τΊ28}}3 (ἐ arbitrary) 
Armand Brumer has suggested that corresponding relations should be known from spe- 
cialization of the Siegel modular functions, but it would be helpful to have explicit results 


in palatable form. 


19. 


131 


REFERENCES 


. J.-B. Bost and J.-F. Mestre, Moyenne arithmético-géometrique et périodes des courbes 


de genre 1 et 2, Gazette des Mathematiciens (Soc. Math. de France) (1988), 36-64. 


. A. Brumer, (Unpublished communication, March 1997). 
. J.W.S. Cassels and E.V. Flynn, “Prolegomena to a Middlebrow Arithmetic of Curves 


of Genus 2,” Cambridge Univ. Press, 1996. 


. H. Cohn, An ezplicit modular equation in two variables and Hilbert’s twelfth problem, 


Math. of Comput. 38 (1982), 227-236. 


. H. Cohn, Introductory remarks on complex multiplication, Internat. Journ. of Math. 


and Math. Sci. 5 (1982), 675-690. 


. ΚΒ. Gundlach, Die Bestimmung der Funktionen zu etnigen Hilbertschen Modulgrup- 


pen, Journ. fiir die reine und angewandte Math. 220 (1965), 109-153. 


. E. Hecke, Hohere Modulfunkttonen und thre Anwendung an der Zahlentheorte, Math. 


Annalen 71 (1912), 1-37. 


. F. Hirzebruch, Hilbert modular surfaces, Enseignement Mathématique 19 (1973), 


183-281. 


. E. Howe, Constructing distinct curves with tsomorphic Jacobtans, Journ. of Number 


Theory 56 (1996), 381-390. 


. G. Humbert, Sur les fonctions abéliennes singuliéres I, Journ. de Math. (5) 5 (1899), 


233-350. 


. J. Igusa, Arithmetic vartety of moduli for genus two, Annals of Math. 72 (1960), 


612-649. 


. L. Konigsberger, Uber die Transformation der Abelschen Functionen erster Ordnung, 


Journ. fiir die reine und angewandte Math. 64 (1865), 17-42. 


. J.-F. Mestre, Familles de courbes hyperelliptiques a multiplications réelles,, Arithmetic 


algebraic geometry (1989), 193-208; Prog. Math., 89, Birkhauser, Boston 1991. 


. J.-F. Mestre, Courbes hyperelliptiques a4 multiplications réelles, Séminaire de Théorie 


des Nombres, Univ. Bordeaux I, Talence (1987-1988), Exp. 34, 6 pp.. 


. R. Miller, Hilbertsche Modulformen und Modulfunktionen zu Q(V8), Math. Annalen 


266 (1983), 83-103. 


. S. Nagaoka, On Hilbert modular forms, Proc. Japan Acad. Ser. A 59 (1983), 346-348. 
. B. Poonen, Computational aspects of curves of genus at least 2, Algorithmic Number 


Theory (ANTS II, H. Cohen, ed.) (1996), 283-306; Second International Symposium, 
Talence, France; Springer-Verlag (Lecture Notes in Computer Science 1122). 


. H.L. Resnikoff, Singular Kummer surfaces and Hilbert modular forms, Rice Univ. 


Studies 59 (1973), 109-121. 
K. Rohn, Transformation der hyperelliptischen Functionen p = 2 und thre Bedeutung 
fiir die Kummersche Fldche, Math. Annalen 15 (1879), 315-354. 


IDA Center for Computing Sciences 
Bowie, MD 20715-4300 
email: hcohn@super.org 


HUMBERT’S CONIC MODEL AND THE KUMMER SURFACE 


HARVEY COHN 


Abstract. In his Cours d’Analyse in 1904, Georges Humbert used the parametrization of 
a pencil of conics through four points by the Weierstrass p-function to prove theorems of 
geometry and mechanics. This method is implicit in his earlier applications of Kummer 
surfaces, for instance his criterion for real multiplication by /2 uses the special “quarter- 
period” configuration in the pencil. 


0. Introduction. This talk is intended to relate two topics found in work of Georges 
Humbert (1859-1921). The first is an easy but unconventional parametrization of the conic 
(Part I), and the second is a conventional parametrization of the Kummer surface (Part 
II). The interrelation of these two topics is thinly sketched for the special goal of real 
multiplication by /2 (Part III). All this is over the field C. 

Some of the best work on the parametrization of the Kummer surface was done by 
Humbert [8] in 1895. It represented the highest state of art in the use of theta-functions 
(which has survived intact) and the algebraic geometry of surfaces (which has transformed 
itself almost beyond recognition). Indeed, more modern treatments (see [3]) tend to use 
purely algebraic tools altogether, rejecting theta-functions. Some very readable accounts 
of hyperelliptic (genus two) jacobian manifolds are given in [14] and [18], and valuable 
bibliographies appear in [3] and [16]. 

After his advanced work, Humbert published an elliptic parametrization of the family 
of conics through four given points {10]. This tool has remained largely neglected despite 
the combination of its simplistic appeal (see [4], [13]) and relevance to the more prized 
real multiplication (see [2], [19]}. 


PART I. ELLIPTICAL PARAMETERS ON A CONIC PENCIL 
1. Two theorems. Humbert [10] displayed two enticing results: 


1.1 EULER. Given two circles in the plane, one containing the other. It is generally 
impossible to construct a triangle which is inscribed in the larger and circumscribes the 
smaller, but when this is possible there is a continuum of such triangles. 


The obvious trivial case is where the circles are concentric. Only when the ratio of radii 
is ν 8 will there be such a triangle, and of course the configuration rotates. 

While Euler simply counted parameters, Poncelet constructed the solution by involutions 
of point sets. Jacobi used the elliptic parametrization we shall discuss in the next section. 
The next theorem provides his special application. 


1.2 JACOBI. Consider a weight following a circular pendulum with position P(t) in the 
plane of the pendulum at time t. Then for any (real) constant z, the chord connecting P(z) 
and P(t + 2) will have as envelope a circular arc joining the high points of the pendulum 


path. 


Mathematics Subject Classification. Primary 11G15, 51A05. Key words and phrases. Poncelet, Humbert, real 


multiplication. 
Presented at the New York Number Theory Seminar 6 Nov. 1997. 


Typeset by AMS-TeExX 


133 


) } } 


D. Chudnovsky et al. (eds.), Number Theory 


© Springer-Verlag New York, Inc. 2004 


134 


2. Parametrizations of a conic. For simplicity we agree that we are in the field C 
and conics are understood to be nondegererate, (not consisting of lines). 

In the projective plane, to begin with, all conics are the same and all possible birational 
parametrizations are equivalent (affinely) to the simplest imaginable, 


(2.1a) z=t,y = t?. 
Actually, the most general parametrization is 
(2.1b) 

y:z:l 


= (a,,t? + aiat + a13) : (a2,t? + Qoot -+- a23) : (a3, t? + aget + a33), 


with matrix form (“tr” for transpose), 


(2.1c) (y,z,1)*" = M (t?,t,1), M =(a,;), det M #0 
It is clear that if M—! acts on both sides, we get the new coordinates for (2.18), 
(2.1d) (y’,2’,1)" = M7! (y,z,1) = (t?,t,1). 


Now by contrast, we create a unirational (two-to-one) parametrization of an arbitrary 
conic section. We consider a (complex) elliptic curve in Weierstrass form, 


(2.2a) y” = 4(: — e1)(z — e2)(z — €3), (x = ρ(υ),ν = ρ΄ (u)), 
(2.2b) εἰ ζῇ ε)(ἐ #7); εἰ τῇ 6 +e3 = 0. 

For uniform notation, we write the parameters 

(2.2c) {to,t1,t2,ts} - {οο, ε1, 62, 63}. 

We recall the periods and values at the half-periods are 

(2.2d) {2w,, 2we, 2w3 (= 2w2 + 2w1)}, 

(2.2e) p(w;) = ε;, p'(w;) = 0, (7 = 1,2,3). 

The period lattice is 

(2.2f) Q = {2n wy + 2n2w2}, (n1,n2 E Z). 


Remembering that t is the birational parameter of the conic (2.1b) in the (projective 
complex) zy-plane, we replace the parameter by u (writing “[u]” symbolically for the point 
(z,y)) as follows: 


(2.3a) [u]: (z,y) = (x(t), y(t)) for t = ρ(υ). 
Since p(u) = p(—u), we have the two-to-one correspondence 
(2.3b) [u] — (z,y), [-u] > (z,9)s (2.¥) > {[1]--]}. 


If two symbols [u] and [2] were “added” we would obtain only two chords from [u] to 
[tu + 2], namely 
(2.3c) {chord ((u][u+ z]), chord ([u][u — z])}. 
(The term “chord” will also refer to the corresponding secant). 
Above all, the values of [u] are periodic (2.2c) which makes for even more interesting 


results. Hence the chords in (2.3c) are distinct unless u or z is a half-period, (2u or 2z lies 
in ἢ). 


135 


3. Pencils of conics. 


3.1 MAIN THEOREM. CONSTRUCTION OF THE PENCIL OF CONICS THROUGH FOUR 
POINTS OF A GIVEN CONIC. Given a conic C(0) with four designated base points in 
parameter t = to,t,,t2,tg. Let us introduce an elliptic function t = g(u) such that those 
four points correspond (uniquely) to ὦ = 0, wi, w2, ws (with the periods as described earlier 
in (2.2d)). Then each point of C(O) is described generically as [u] and [--α]. 

Let z be a fixed parameter and draw the chords (secants) connecting |u|] with [u+ 2] and 
connecting |u] with [ὦ — 2], assuming z is not a half-period. Then the chords vary with u 
sweeping out a conic C(z) as their envelope with points of contact P and —P (see Figure 
1). This is the most general conic of the pencil. 

Finally, for z ἃ half-period, w,,w2,w3, C(z) is one of three limiting degenerate conics, 
namely the diagonals of the complete quadrangle determined by the base points. 


The essence of the construction is that the envelope of the chords is a curve from 
which we can draw two tangents from an external point, hence a conic C(z). When 
u € {0,w,w2,w3}, then there is only one tangent, which corresponds to the four base 
points (of intersection of C(0) with C(z)). 

Let us next consider the three degenerate conics of the pencil, C(w;) with intersecting 
lines at A, (as in Figure 1). Here, A, is the intersection of the chords (of C(w;)) from [0] 
to [w,] with (cyclically) the chord from [w,+1] to [wi42]. Also the points [u] and [ὦ + w,] 
are tn involution. This involution is created from the requirement that the chord from [u] 
to [ὦ + w,] must go through A;, and indeed C(w,) is the “degenerate envelope” of these 
chords. 

According to this same involution, the lines from A; which are tangent to C(0), (not 
shown in Figure 1), must occur where tu = u+ ὡς mod ἢ, ie., at quarter-periods, viz., 
+w,/2,+(w;+w,;41)/2. There are six of them, identified with the tangencies on C(O) from 
A,,Az2 and the imaginary tangencies from A3, (using the notation of Figure 1). 


3.2 COROLLARY. Any given pair of nontangent conics, (not equivalent to concentric 
circles), can be embedded in a pencil so as to be C(0) and C(z) for a proper choice of 
parameter z. 


The cases considered are those where the conics have four distinct points of intersection, 
which become the base points. Of course all points are projective-complex. 


4. Proofs for §1. For Theorem 1.1, we consider two circles (or any other conics indeed) 
as C(0) and C(z) of a pencil. Then a triangle inscribed in C(0) and circumscribing C(z) 
must have on C(0) the vertices 


(4.1a) S = {[u], [u+z], [ὦ -- 2]}, 

and these must be the same if we start from u = u+zoru = u-—z. Easily, [u+2z] = [ -- 2], 
etc., so 

(4.1b) 3z = 0 modN. 


Thus we can “rotate” the configuration by making u vary. 


136 


Figure |, Generic real case 


There is a similar result for an n-gon rather than a triangle. It corresponds to n-points 
of division of the period structure N. It would be dependent on the parity of n, and, worse, 
it should require a painful sorting out of reentrant polygons. We do not pursue this matter 
here (see [10]}. 


Theorem 1.2 is again relegated to the literature [10] with the remark that the spherical 
pendulum is parametrized by the g-function. 


5. The quarter-period configuration of the pencil. We consider the special case 
of a primitive quarter-period 


(5.1) 45 Ε Ὦ, (22 € ἢ). 


This is shown in Figure 2, where two base-points are real and two imaginary. 

Here, [u] creates its pair [u + 2] but [ὦ + 22] creates the same pair. (Remember 32 = 
—z mod 2). If we look at the four points just mentioned, the chords produce the following 
tangencies at C(z): 


137 


chord([u], [ὦ + z]) is tangent at “δ᾽ 
chord([u], [u — z]) is tangent at “6” 
chord({u + 2z],[u + 2]) is tangent at “4” 
chord({u + 22], [ὦ — 2]) is tangent at “3.” 


As we saw earlier, since 2z is a half-period, the chords between [u],[u+2z] and [u+z],[u—z] 
go through 42 (the center of the degenerate conic). 

Additionally, if u = z/2 or 3z/2, one chord joins them and another becomes tangent to 
C(0). This produces two points of tangency of C(z), written as “1” and “2.” 

These six points of tangency on C(z) are a very valuable set. We first ask how many 
degrees of freedom are in the set. There is one module in the four base points on C(0), 
essentially the j-invariant of the elliptic curve (2.2a). There is one other module, the value 
of u. Let s; be the (biunique) t-parameter of the tangent point “17 on C(z). Then Humbert 
[9] showed that the hyperelliptic curve 


(5.2a) y? = (x -- 8)(z — s2)(xz — 83)(5 — 54)(x — 55)(z — 86) 
has modules creating real multiplication by /2, or otherwise expressed, modules trans- 


formed by the Hilbert modular function for \/2 (see [6}). 
In more convenient parameters (see [4]), the quintic (5.2a) becomes 


(5.2b) νυ = a(x — 1)(x — 52)(5 — a;)(z — a2), 
r—rs+s*+s rt+rs+s*—s 
(5.2c) αι = τ-----------, & = r——_ 
| r+rs+s*—s r—rs+s*+s 


6. The pencil as a symmetrized period manifold. The periodic structure C/N 
(the parallelogram) is the exact image of the elliptic curve (2.2a). Now C(z) is by definition 
periodic in z, but also it is symmetric, so 
(6.1a) C(z) = C(z+2w,;) = C(—z). 

So the usual parallogram is reduced by several identifications. Generally, from (6.1a), 
(6.1b) C(w,+z) = C(u, — 2). 
So there are centers of symmetry at each of {0,w,w2,w3}. It is easy to see that under 


the symmetry the manifold for z namely (C/N)/{+u} is of genus 0. Correspondingly, the 
manifold of [u] can be the (birationally parametrizable) conic. 


138 


PART II. THE KUMMER SURFACE 


7. The symmetrized jacobian. We make the generalization from elliptic functions 
(of one variable and two independent periods) to hyperelliptic functions of two variables 
with four periods. We restrict ourselves to period systems reducible to the columns of the 
so-called Riemann matrix 


10a 6b 
(7.1a) R= (4 1b δ), 


with three complex parameters a, b,c whose imaginary parts form ἃ positive definite matrix 
inside (7.1a), 


(7.1b) %(aé? + 2b€n +en”) > 0, for real (ξ, η) ζῇ (0,0). 


Such period matrices come from the periods of hyperelliptic integrals (of genus 2). The 
quadruple periods form a lattice L in C? (with complex variables u and v), viz., 


(7.1c) Periods of (*) . 1, = {G, (3) +G, (7) +H (5) + Hy (2) 


(1.14) G1, Go, Hi, Hz ε Z. 


The manifold of C?/L is a period parallelopiped in two complex dimensions. It is called 
a “jacobian” manifold because of its original application (see [14]) where u and v are 
hyperelliptic abelian integrals of the first kind with the periods shown. 

Functions defined on it are quadruply periodic (meromorphic) functions, which may 
be represented as the quotients of special holomorphic functions. These are not quite 
quadruply periodic but have exponential periodicity factors of a special form. They are 
called normal theta-functions of order N (see [5]) and are defined as a linear space (over 
C) by the period relations 


(7.2a) On(u,v) = O@n(ut+1,v) = On(u,v+1), 
On(u,v) = exp2a[i(u+a/2)N] On(u+a,v+ ὃ) 
(7.2b) = exp 2a[t(v + c/2)N] On(u+b,v+c). 


Special attention is given to even theta-functions defined by 
(7.2c) On(u,v) = θνί--υ, --υ). 


It can be shown, a priori, that the space of normal theta-functions is of dimension N? 
and the subspace of even functions is of dimension (N? + 4)/2 when N is even. (Odd 
theta-functions are the complementary subspace; they have a (—1) factor in (7.2c), but 
they are not needed here). | 


139 


Analogously with elliptic functions, we say the jacobian manifold is C?/L. The sym- 
metrized jacobian manifold is (C?/L)/{+(u,v)}; it is the object that corresponds to a 
Kummer surface. 


8. Formulas for theta-functions. We give only the briefest survey, with minimal 


proofs. 
The best known example of a normal theta function is of order N = 1: 


—~00,+00 
(8.1a) 6(u,v) = > exp 27i|(mu + nv) + ¢(m,n) [2] 
(8.1b) d(m,n) = am? + 2bmn + en?. 


Trivially, 6(u,v)* is of order N but more sophisticated examples are needed. 
We must construct 16 auxiliary theta-functions of (order one) but with characteristics 


defined by the quadruples: 
(8.2a) 91,92,h1,he € {0, 1}. 


This introduces signs into the relations (7.2ab). We define 


(8.2b) © ({ Ἢ (:) 


τ κ g 92 μι τ πὴ 1,,9 92 
. 1 2 1 2 1 
= » exp 2πι(Ὁ + m)u+(> +n)u+ — 3 + 54(= +m, > +n)]. 


If the characteristics are all 0, then of course we are back to (8.1a). (It is possible to define 
such functions as in (8.2b) with order n, but they are not needed here, compare [14}). 


The modified period and symmetry relations are 


(8.32) o(f 2) (%) 


σι 92 u — ¢4\ha . a σι 92 uta 
(5 5() = (-- 1 exp 2ni(u + 5)9 ( #4 2) (ets) 


_ (4\ha c 91 92 μ -Ὁ ὃ 
(8.3c) = (—1) exp2mi(u + 5)@ (δι 5) (Sr). 


σι 92 u -- (— Jihitgahe σι 92 πυ 
o(n 5) (0) = commeme (fh 5) (2), 


140 


The squares of these functions are all normal even functions of order 2. | 
᾿ς The important fact about these 16 functions is that they have as roots the 16 half-periods 
in some combination. From (7.1c) these 16 half-periods are 4/2, 4/2 where 


(8.4a) d = Git Hya+ Hob, p = Got Hib + Me, 
where, from now on, 
(8.4b) G,,Hi1,G2,H2 € {0,1}. 


By combining all the identities (8.3abc), we find 


91 92 utaA\ | σι 92 u 
οἰ me) (012) =e) () 


x exp 271[H(u + Gi + 5) + Ho(v+ Get 5) + 


(8.4c) 


91G1 + 92G2 +h, Hi + hoe 
2 


|. 
We look for roots of the theta-functions among the half-periods (uo, vo) with 
(8.4d) uo = —A/2, v9 = --μ(2. 


By substituting these values in (8.4d) and using symmetry (8.3c), we find 
σι 92 A/2\ _ σι 92 r/2 
om of 609) = 08 8) G8) « 


with multiplier reminiscent of (8.3c), 
(8.4f) € = exp mt((gi + H,)(hy + G1) + (σε + H2)(h2 + G2). 


Of course both sides of (8.46) vanish if ε = —1. So for each of the sixteen characteristics 
there corresponds of six roots (i.e., half-periods): 


(8.5a) (91,92,h1, ha) -» (G1,G2,H1,H2) (one-to-six) 

for which ε = —1. Actually, the six values follow modulo 2 from the columns: 
m+ = 01011 1 

(8.55) ΤΟ Ξ 0111041 
ho+G2 = 101411 £0 


We later verify that half-periods may occur as roots only from the above table. This is 
not a trivial observation, since a function of two variables has as its roots a submanifold 
of dimension one. 

It is convenient to use the dyadic representation for the characteristics of each theta- 
function 


141 


(8.5c) J = 891t+492+2hy+h2, (0 « 7 « 15), 

and for each of the half-periods 

(8.5d) J = 8Hi+4H2+2G,;+G2, (0<J<15). 

So from (8.5b) each 7 engenders six values of J, denoted for later convenience by the set 
C;: | 

(8.5e) C; = {J}: 7 — J, (one-to-six). 


This relation is shown in the columns showing six values of J for each C;: 


8.6 THE KUMMER TABLE OF THETA-FUNCTIONS AND ROOTS. 


Co Cri Co C3 Ca Ca Coe Cr Cg Co Cio Cir Ci2 Cis Cia Cis 
5 4 7 6 1 0 3 2 13 12 15 14 9 8 11 10 
10 11Ι 8 9 14 15 12 13 2 3 0 1 6 7 4 δ 
7 6 5 4 3. 2 1 0 15 14 13 12 11 10 
11 10 9 8 15 14 13 12 3 2 1 0 7 6 
13 12 15 14 9 8 11 10 5 4 7 6 1 0 
14 15 12 13 10 i141 8 9 6 7 2 3 


ow oo 
m Bh me 00 


8.7 GOPEL QUADRUPLES OF THETA-FUNCTIONS. These are unordered sets of four 
distinct (ordered) quadruples of characteristics 


(8.7a) κι = (gf 99 WYP AS”), (r= 1,..-.,4) 
such that (summing over r in each quadruple Kr) ) 
(8.7b) So of) = So gh? = Sr al) = A? 

= δ (gi at”? + 9§h$”) = 0 mod 2. 


They have the property that sign factors of (8.3abc) cancel in the product of four theta- 
functions of these characteristics. (There are 60 such sets). 


Thus for the product of elements of a Gépel quadruple, (say) 
(8.7c) 0.8765 = 0,940,905 
is a normal even theta-function of order N = 4. 


9. The Kummer surface. We start with a G6pel quadruple. Since there are only 
10 linearly independent even normal functions of order four, some relation exists with 
constant coefficients among the 11 indicated functions: 


(9.1a) ΘαΘρθηθς = δ᾽ A.O4 + Ἦ δ᾽ Ba,p0203. 
a a,f 


142 


If we introduce the variables 
(9.1b) z= @©?, y= @4, z= e3, w = 03, 
then we square (9.1a) to obtain a homogeneous equation of degree four 
(9.1c). syzw = P(z,y,z,w)?, 


(with P(z,y,z,w) the homogeneous form of degree two from (9.1a)). This equation rep- 
resents the Kummer surface, which is important because it is tsomorphic to the jacobtan 
modulo even symmetry (u,v) — (—u,—v). This last result is essentially an easy conse- 
quence of the degree four (as we shall later note). 

Actually, the Kummer surface has simple singularities, with the neighborhood of a point 
looking like a cone (rather than a plane). This condition leads to the form (see [17]} 


P(z,y,2,w) =([z?+y? +27 4+ w? 
(9.1d) + 2f(yz+ xw) + 29(2x + yw) + 2h(ry + zw)|const. 


The roots (8.4a) of the theta-functions, seen previously to be half-periods, are now 
also seen to be the singularities of the Kummer surface. Actually, this follows from the 
symmetry about the half-periods in (8.4d), 


(9.16) f(u,v) = f(—u,—v) = f(utA,vty) => f(A/2+u, u/2+v) = f(A/2—u, p/2—-v). 


This is more complicated than symmetry in one variable. It means that f(A/2+u,u/2+v) 
has an expansion in terms of u?, v? and (addittonally) uv. So a change of variable is not 
enough to remove the singularity (as in the case of a Riemann surface with a branch point). 


9.2 THE KUMMER CONFIGURATION. The sixteen singular points lie in sixteen conics 
(also denoted by C;) with six points to a conic as shown in the Kummer table. Each point 
lies on six conics. 

Each conic lies in a different plane, so that each pair of conics intersects in two singu- 
larities. This creates a correspondence of each of the 120 pairs of singular points to a pair 
of (intersecting) conics. 


Typical correspondences through intersections are shown here. Note from Kummer’s 
table the reciprocity of the relations | 


(9.2a) CoNCy — {10,11}, {0,1} 4 Cio NC. 


Also, for any conic (say Co), each of its fifteen point-pairs (diagonals) will account for 
the intersections with the fifteen other conics. For instance, the point-pair {11,13} in Co 
accounts for Cg, i.e., {11,13} - CoNnCg. 

The 60 Gépel (unordered) quadruples of characteristics (i.e., of theta-functions) become 
quadruples of conics {C;,C;, Cx, Ci} for which, (taken in any or every order), 


(9.2b) Οι = (C;NC;) ὦ (Ce N CK) U (CEN C)). 


143 


Thus {Co,C1, C10, C11} is such an example: 


Conc, = {10,11}, CoN Cio. = {7,13}, CoNC yy = {5,14}, 
(9.2c) CyNCio = {4,15}, σις, = {6, 12}, Ci NCy. = {0, 1}. 


Otherwise expressed, there are 6 singularities on each of the four conics, but in total there 
are only 12, as they each appear twice. (Compare [7] for further properties). 

A Gé6pel quadruple of conics (9.2b) has the equivalent property that the singularities 
labelled {1,7,k,1} have no subset of three lying on one conic. 


10. Some typical proofs. The original theoretical development was highly intuitive 
from Riemann to Poincaré [15] and even beyond. Much of the flavor of the subject was 
lost by modern standards of rigor (see [11)). 


10.1 POINCARE. A theta-function of order N and one of order M have 2MN roots in 
common (or infinitely many). | 


As a limiting case in (7.18), when ὁ — 0, we have two periodic systems one for ὦ 
of periods (1,a) and one for v of periods (1,c). The theta-functions break up into the 
product of two of the familiar elliptic type, and the simultaneous roots in the limiting case 
might come from 


(10.1a) OY (ue (vy) = 0 (ua (v) = 0. 
Each @y (or θμ) has N (resp. Μὴ roots, so we make the choice: 


(10.1b) 9D (u) = 04 (v) =0 versus 6) (v) = 6) (u) =0. 
With either choice we have MN common roots, hence the result. The counting formula 
must hold true for ὁ 4 0 by continuity. 

As a consequence, any of the theta-functions of order one in (8.2b) indexed by 7 has only 
six of the roots indexed by J as shown in Kummer’s table (no other half-periods). This is 
true because Kummer’s table is an exact accounting of (half-period) roots, i.e., two such 
functions have only two roots in common, while any two columns share only two roots. 

As a second consequence, the Kummer surface (9.1c) is an isomorphic image of the 
symmetrized jacobian, (C?/L)/{+(u,v)}. If we make 


(10.1c) z/w = const, y/w = const, 
then by (9.1b), this means 
(10.1d) ©? — (z/w)O? = ΘΖ — (y/w)0§ = 0. 


Thus a pair of theta-functions of order two vanishes from (10.1c). Such a pair is satisfied 
only by eight values (u,v) or (by symmetry) four values +(u,v). Of course the line (10.1c) 
must have exactly four intersections with the Kummer surface (of degree four). This proves 
the desired isomorphism. 


144 


PART III. ELLIPTIC CONFIGURATIONS ON THE KUMMER SURFACE 
11. Abelian functions and intermediate functions. The central object is the 

abelian function, a meromorphic function f(u,v) (of two complex variables) which has a 

quadruple set of periods, so it is defined on a manifold like C?/L in (7.1c). Specifically, 


(11.1a) (01,41) € {(1,0), (0,1), (2, 4), (6,¢)}, 


(11.1b) f(u,v) = f(uta;,v+f;), 7 =1,---,4. 


By atheorem of Appell and Poincaré [1], such a meromorphic function is globally a quotient 
of holomorphic nonvanishing intermediate functions 6(u,v) with the property 


(11.1c) O(ut+a;,v+f;) = 0(u,v)exp(y;u+ 6;u+.6,;), 7 =1,°°- ,4 


with involved consistency conditions on the 4;, 6;,¢;, (which come from the commutativity 
of the period addition, see[12}]). 
This structure comes from hyperelliptic curves, like that of (5.2a) if we set 


(z,y) (z’,y’) 
(11.2a) “= [ dz/y + [ dz/y, 
(z,y) (z',y‘) 
(11.2b) v= [ zdz/y + [ τάς ν. 
oo oo 


Then the paths on the Riemann surface of H define four independent periods (on say 
C?/L) and symmetric functions such as 


(11.2c) fi = zz’, fp = ct+2', fg =yty', fa=yy’, 


are abelian functions. We can replace u,v by linear combinations of u,v and choose paths 
cleverly so that we return to the periods in 


10a ὁ 


with three complex parameters a,b,c (satisfying (7.1b). It has been shown [14] that 
fi,f2,f3 are generating functions of the field of abelian functions, while f,, fo, [4 (as well 
as z/w,y/w, z/w of (9.1b)) are generators of the subfield of even functions. (Note that the 
change y — —y through a different path leaves f4 unchanged while (u,v) + —(u,v)). 

Special intermediate functions are the theta-functions, which were used to construct 
the Kummer surface. Since an algebrate curve on the Kummer surface ts rational in the 
homogeneous coordinates z,y,z,w of (9.10), it follows that such a curve 18 determined by 
the zero locus of an tntermediate function. 


12. Singular abelian manifolds. The abelian manifold C?/L is called singular when 
the lattice of periods L has a 2 x 2 nonscalar matrix (S ¢ EZ) as an endomorphism, called 
a “complex multiplication” 


145 


4s | C(O 


by 
Ν : 7 
a ἡ Lu+2Z) 


igure 2, Quarter-period case 


(12.1a) SR = RM 


for M an integral 4 x 4 matrix. Here S signals a change of basis for (u,v) and M ἃ subbasis 
for the lattice of periods R. 

Generically, (12.1a) is not possible if det(M) # +1. This is similar to complex multipli- 
cation which is nongenerte for elliptic curves over C. In other words there must be a loss 
of free parameters in (11.2d). For instance, we just take the case where a singular relation 
happens to be 


(12.1b) a = 2c. 


Then the relation (12.1a) becomes valid with 


(12.1.) 5 = (: 2), M = (5 gir) 


146 


This is called a “real” case [9] of complex multiplication since the eigenvalues of S are real 
(+V2). 

To follow through on Humbert’s approach, we note that the intermediate functions 
O(u,v) of (1.1c) may have special consistency conditions on the exponential coefficients. 
Thus we define a singular intermediate function as one whose period relations (11.1c) hold 
only by virtue of a singular relation like (12.1b). An example of such period relations is 


(12.2a) (u,v) 
(12.2b) | 


O(ut+i1,v) = θίω,υ - 1), 


O(u+a,v+6) = θίω, υ) ἐχρ 4πῖνυ, θίω -Ὁ δ,υ  ο) = θίω, υ) exp 2miu. 


Now the vanishing locus of a singular intermediate function is ἃ ssngular algebraic curve 
on the Kummer surface. If everything is viewed from the perspective of one of the singu- 
larities (at the tangent cone) there will be six conics passing through the singularity, lying 
in six planes. These planes are viewed as six lines with 15 intersections (for the remaining 
15 singularities). Some singular curve now will project into a configuration with two conics 
and six lines. 

We shall describe the configuration and skip the details involved in its justification 
(which is unfortunately lengthier than the entire current presentation!) 

We look again at Figure 2. Here the six lines are those dashed lines, actually tangents 
to C(z) passing through the points numbered 1,2,3,4,5,6. (The notation has the risk of 
confusing these points with singularities but, actually, only four of the 15 singularities are 
shown, at intersections labeled [u],[u + z],[u — z], [w+ 22]). 

Also C[0] is the projection of some singular curve and the fact that the points 1,--- ,6 lie 
on six tangents from 670] causes the points to be special. (Only five constraints determine 
a conic). A further link in this arcane chain of reasoning is that the parameters of the 
hyperelliptic curve represent the six points of tangency. This explains the specialization 
of the hyperelliptic (5.2bc), (see [4] for explicit computation). | 


13. Concluding remarks. There is a larger literature for real mutiplication by V5 
because the conditions on the hyperelliptic curve are more interesting from the standpoint 
of group theory. Here there is aspecial “fifth-period” configuration which we do not pursue, 
restricting ourselves to the easier case of \/2. These are the only two cases which can be 
treated with any degree of satisfaction as of now (see [2], [9], [19]). 

The author gratefully acknowledges helpful conversations with Armand Brumer. 


147 


REFERENCES 


. P. Appell, Sur les fonctions périodiques de deux variables, Journ. de Math. (4) 7 

(1891), 157-219. 

2. P. Bending, Curves of genus 2 with /2 multiplication; (unpublished dissertation) . 

3. J.W.S. Cassels and E.V. Flynn, “Prolegomena to a Middlebrow Arithmetic of Curves 
of Genus 2,” Cambridge Univ. Press, 1996. 

4. H. Cohn, A hyperelliptie curve with real multiplication of degree two; (see Contents) . 

5. J.D.Fay, “Theta Functions on Riemann Surfaces,” Springer Verlag, 1973; Lect. Notes 
in Math 352. 

6. E. Hecke, Hohere Modulfunktionen und thre Anwendung an der Zahlentheorie, Math. 
Annalen 71 (1912), 1-37. 

7. M.R. Gonzalez-Dorrego, (16,6) configurations and geometry of Kummer surfaces in 
P*, Memoirs of the AMS 512 (1994). 

8. G. Humbert, Théorie générale des surfaces hyperelliptiques, Journ. de Math. (4) 9 
(1893), 27-171, 361-475. 

9. G. Humbert, Sur les fonctions abéliennes singuliéres I, Journ. de Math. (5) 5 (1899), 
233-350. 

10. G. Humbert, “Cours d’Analyse II,” Gauthier-Villars, 1904, pp. 238-249. 

11. H. Lange and C. Birkenhake, “Complex Abelian Varieties,” Springer-Verlag, 1992. 

12. J. Lewittes, Riemann surfaces and the theta function, Acta Mathematica 111 (1964), 
37-61. 

13. J.-F. Mestre, Courbes hyperelliptiques ἃ multiplications réelles, C.R. Acad. Sci. Paris 
(1) 307 (1988), 721-724. 

14. E. Picard, “Quelques Applications Analytiques de la Théorie des Courbes et des Sur- 
faces Algébriques,” Gauthier-Villars, 1931; (Notes taken by J. Dieudonné) . 

15. H. Poincaré, Sur les fonctions abé sliennes, Acta Math. 26 (1902), 43-98. 

16. B. Poonen, Computational aspects of curves of genus at least 2, Algorithmic Number 
Theory (ANTS II, H. Cohen, ed.) (1996), 283-306; Second International Symposium, 
Talence, France; Springer-Verlag (Lecture Notes in Computer Science 1122). 

17. G. Salmon, “A Treatise on the Analytic Geometry of Three Dimensions II,” 1914, pp. 
50-51; Chelsea Reprint 1965 . 

18. C.E. Traynard, “Fonctions Abéliennes et Fonctions Théta de Deux Variables,” Mém. 
de Sci. CLI Gauthier-Villars, 1962; (Lectures of P. Painlevé 1902) 

19. J. Wilson, Curves of genus 2 whose Jacobians have a \/5 multiplication; (unpublished 

dissertation) . 


punt 


IDA Center for Computing Sciences 
Bowie, MD 20715-4300 
email: hcohn@super.org 


ARITHMETICITY AND THETA CORRESPONDENCE 
ON AN ORTHOGONAL GROUP 


ZE-Li Dou 
To Professor Kenneth B. Kramer 


Introduction 


In this paper we discuss several related topics in the theory of automorphic forms, 
on which the author’s current research is focused. Roughly speaking, we shall deal 
with the questions of arithmeticity regarding special values of certain zeta and L- 
functions, as well as algebraicity results concerning periods and Fourier coefficients 
of certain automorphic forms. 

The author has attempted to maintain a balance among several goals. He nat- 
urally wishes to bring the reader to his current research. On the other hand, 
he considers it an essential part of his task to explain how his work fits into the 
larger program concerning the same topics mentioned above, which was initiated 
by Shimura, and is being intensely pursued by Shimura himself and many others. 
From this point of view, it is not desirable to showcase only the results in one spe- 
cific setting. Finally, as this paper is based on a talk given at the Graduate Center 
of the City University of New York, the author also wishes to address as large an 
audience as is practical. This puts a further constraint on the amount of techni- 
cal material involved, at least initially. Consequently, he has adopted a “gradual 
generalization” approach—the article begins with a motivational section discussing 
the simplest possible setting only, and then each subsequent section introduces a 
generalization in at least one direction. Also, in each of the sections §§2-4, we 
concentrate on only one key concept. Thus in §2 we discuss the arithmeticity of 
automorphic forms; a nearly holomorphic arithmetic function (at critical points) is 
constructed in §3; and the main part of §4 is devoted to the discussion of a theta 
correspondence. These topics are all mutually related, and such relationships are 
pointed out in the discussions in the text. In general terms, we investigate, in 
the setting of a totally real quadratic extension of a totally real algebraic number 
field, a theta correspondence (see §4), and the algebraicity results on periods and 
L-values that follow (such as results of the type mentioned in §2), which can be 
derived with the aid of certain arithmeticity results of functions analogous to the 
f(w,s) considered in §3. 

With this approach, the author hopes to have given an adequate explanation of 
the essential points of that part of the program in which his research is engaged. 
Admittedly, this is neither the most efficient nor the most “logical” approach—one 


Research for this article was partially supported by NSA Grant MDA904-97-1-0109 and by the 
Texas Christian University Research Foundation. 


149 


{ ‘ieee Arn} 
D. Chudnovsky et al. (eds.), Number Theory 


© Springer-Ver lag r New Yo vile Inc. 2004 


150 


could have started from the setting of §4 and then proceeded backwards, thereby 
eliminating §1 altogether. Also, with this approach this paper cannot contain a 
full list of the author’s results with all their technical precision. However, he views 
his results as ones complementing those of Shimura, in a naturally generalized 
setting. Therefore, an understanding of Shimura’s methodology is crucial. For this 
reason the author considers the tradeoff more than reasonable, in the sense that 
the author’s results can to a certain extent be “inferred” from a understanding of 
Shimura’s theorems in this paper, but not the other way around. Finally, due to 
the limitation of space, no proof has been included in this paper. To compensate 
for these shortcomings, the author has inserted references throughout this paper, 
where precise statements of the latest results can be found, as well as their proofs. 

I would like to thank Toni Bluher for several conversations concerning both the 
content and style of this paper; I have found them extremely helpful. It is also 
my most pleasant duty to thank the editors, Professor Melvyn Nathanson and 
Professors David and Gregory Chudnovsky, for their invitation to give a talk at the 
Graduate Center, and for their subsequent invitation to include this paper in their 
volume. In addition, Professor Josef Dodziuk’s hospitality is responsible for some 
of the fondest memories of my visit to New York City in 1997. 

This paper is dedicated to Professor Ken Kramer of Queens College, as a token 
of my sincere gratitude to him. I am very fortunate to have been a recipient of his 
generous and constant support for well over a decade, starting from my years as an 
undergraduate student. 


1. Two examples of arithmeticity and near holomorphy 


In this section we shall provide some motivation for the concept of arithmeticity 
by considering two examples in the case of the most classical modular forms. In so 
doing we follow the lead of Shimura. (See [Sh95].) We introduce some notation as 
follows: 


H = {z€C|Im(z) > 0}, F=Q, 
and 


c =0 (mod wy} 


Po(N) = fc = (: ®) ε SL»(Z) 


A meromorphic function f : H —> C is called an automorphic form of weight k for 
some k € Z if it satisfies the following condition with respect to the usual action of 
elements a €T'o(N) on z € H, namely, 


f(az) = j(a,z)"f(z), νὰ ε To(N), 


where ; 
2 a ᾿ and 7(α, 2) Sez +d. 


We shall write 
fllka(z) = 7(α, 2)~* f(az). 


Then the above condition becomes simply 


(1) f\lea =f, WaeETo(N). 


151 


A holomorphic automorphic form f is one which is holomorphic on H and whose 
Fourier expansion at every cusp has the form δ πο ane2™"7/h where h is a pos- 
itive integer. That is, we require f to be holomorphic at every cusp. (Note that 
this requirement is not needed in the case of F # Q.) If in fact ag = 0, then 
we say f is a cusp form. The space of all such holomorphic automorphic forms 
is denoted by M,(To(N)); its subspace of cusp forms is denoted by S;(To(NV)). 
Both of these spaces are finite dimensional vector spaces, whose dimensions can be 
easily computed via the Riemann-Roch Theorem. To avoid having to refer to a 
specific congruence subgroup such as I'p(N), we take the union of all such spaces 
of holomorphic automorphic forms and denote it simply by M,. Similarly we have 
the definition of Sx. 

There are commutative, self-adjoint (with respect to the Petersson inner product) 
linear operators acting on the spaces of automorphic forms, which are called Hecke 
operators. As a consequence, there exist common eigenforms with respect to the 
Hecke operators. Also, we have the theory of newforms. These topics are very 
standard, and therefore we will not give any further explanation. A normalized 
Hecke eigenform will be referred to as a primitive form. 

Let ¢ be a Dirichlet character defined modulo N. Then a character on ΓΟ(Ν) 
is given by ¢(a) det 4 (dq), where dg is the (2,2)-entry of a. We may consider the 
elements f of My, satisfying the condition 


fllka = d(a)f, Wa ε Γο(Ν)." 


Suppose f is a cusp form satisfying this condition. Then we can write 


οο 
-- δ ane", 
n=1 


For every primitive Dirichlet character y, we define a Dirichlet series D(s, f,x) as 
follows: 


D(s, f,x) = So x(n)ann 
n=1 


Here s is a complex variable. This series is convergent on a half plane and can be 
analytically continued to the entire C-plane. 

We can now state the following theorem: 
Theorem. Suppose k > 1 and f is a primitive Hecke eigenform. There exist two 
periods py and p— such that 


τίχ) (mi)~™ D(m, f,x) EpeKKy, O<m<k, 


where T(x) ts the Gaussian sum of x, Ivy is the field generated over Q by the x(n) 
and the Fourier coefficients an of f, and ε = +1 is determined by the relation 
χί--1)ε = (-1)”. 

For a proof, see for example [Sh76]. Note that there are infinitely many possible 
characters x and hence also infinitely many values D(m, f, x) as described above. 


152 


Thus the existence of the periods py and p_ in the above theorem is remarkable and 
is a manifestation of the arithmeticity of the values of the zeta function D(s, f, x). 
Our next example is an Eisenstein series, which in the simplest case is given by 


Ex(z,s)= δ᾽  ¢(a)j(a,z)*(Im(az))*, 2€H,s eC. 
a€(PAr)\P | 


Here we have written ΓΟ(ΝῚ) as Γ for brevity, and the meaning of k,a,¢, and 
φ(α) are the same as above. The symbol P is defined by P = {a € SL2(Q) | 
Ca = Ὁ}, where cq is the (2,1)-entry of a. This Eisenstein series is convergent for 
k + Re(2s) >n+1. Also, after we multiply E,(z,s) by a non-zero entire function, 
the resulting product can be continued to a real analytic function on H x C which 
is holomorphic in 8. We are interested in the arithmetic nature of E,(z,m) for 
certain critical integers. As an example, let us take m = 0,k = 2, and the trivial 
character φ = 1. Then we have 


E2(z,0) = -- tae +1- 2) ( δ᾽ a) erin 


n=1 ‘O0<d|n 


Notice that this is not holomorphic. For this reason we introduce the notion of 
near holomorphy.! A function f : H —+ C is said to be nearly holomorphic if it 
can be written as a finite sum of expressions of the form p(Im(z)~!)g(z), where 
p is a polynomial and g is a holomorphic function. We may also speak of nearly 
holomorphic automorphic forms if we add in the automorphy condition (1).2 We 
then say that a nearly holomorphic automorphic form is arithmetic, or Q-rational, 
if the Fourier coefficients of such a form are algebraic. Theorems of Shimura, when 
specialized to our example at hand, then show that a modified version of £2(z,0) isa 
nearly holomorphic and arithmetic automorphic form. In fact, Shimura showed that 
it is Q,p-rational. For an illuminating discourse on the concepts of arithmeticity 
and near holomorphy, Shimura’s [Sh95] is strongly recommended. Later on in this 
paper, we shall present, in certain more general settings, the results mentioned in 
this section in precisely formulated forms. We shall also see the interconnection of 
such results. 


2. Arithmetic automorphic forms with respect to a quaternion algebra 


In this section we shall define arithmeticity more precisely. As always in this 
paper, we confine our discussion to the orthogonal case. Also, in accordance to 
the plan of “gradual generalization” we have adopted, we shall not operate in the 
most general setting possible. For the general development we refer the reader to 
Shimura’s papers [Sh80], [Sh86], and [Sh87]. (See also 83 of this paper.) In this 


1If we had chosen k > 2 in this case, the situation would have been simpler. But the occurrence 
of near holomorphy is natural and not sporadic. See (Sh86}. 

2In the vi of our example, we also need to assume that f has a Fourier expansion of the 
form (f{|,a)(z = poe 1 Im(z)7 > paar Cyan exp(2rinz/No). This condition is not needed if we 
take a number τις different from Q or if the automorphic forms are defined on multiple copies of 
H. 


153 


section, we shall consider the setting of a quaternion algebra over a totally real 
algebraic number field. The main reference for this section is [Sh81I] and [Sh81I]]. 
Let, then, Εἰ be a totally real algebraic number field with [Εἰ : Q] = n, and let 
B be a quaternion algebra over F. Recall that this means B is a central simple 
algebra of dimension 4 over F. We have an isomorphism of the following type: 


(2) B@gR&M)2(R)’ x HH", 


where H denotes the ring of Hamilton quaternions. We arrange the archimedean 
primes v; in such a way that B is unramified at v1,v2,... ,v, and ramified at 
Ur+1,Ur+2,-+. γύρῃ. We assume that r > 1. Notice that this development allows the 
possibility B = M2(F); therefore it includes the Hilbert modular forms as a special 
case. 

Writing A for the adele ring of Q, we let 


Ba=B®gA andwrite Ga = (Ba)”. 


Here the symbol R* denotes the group of invertible elements of any ring R. Then 
we have Gg = Β΄. The infinite part of Ga will be denoted by Go, which is 
isomorphic to GL2(R)" x (H*)"~". We also write the identity component of Goo 
as Goo4. We then define 


Gat = {zt € Gal Zoo Ε Go+} and Gor Ξ σο ἢ Ga. 
For every 0 < m € Z, there is an R-rational irreducible representation om : 


H* —+ Gin4i(C) of degree πὶ, which is unique up to equivalence. By fixing a 
suitable isomorphism (2), we may assume that the following properties are satisfied: 


ty; € M2(Q), Vi<i<r,Vzr € B; 
om(2v;) € Mm41(Q), Vr+1l1<i<n,Va € B; 
Om(2*) = tom(z), Va € H. 


Here * denotes the main involution of B as well as its natural extension to Ba. If Kk = 


(κ τ αν Κα 27... )Kn) € Z"~" such that «; > 0, then we may define a representation 


o, on Ga by 


n 
on(2) = (%) σ, (xi). 
t=r+1 . 
The representation space is @j_,,, C*it’, which will be denoted by +. 

We can now define the action of elements of Gay on H", the factor of holomorphy 
j(x, z) (where z € H"), and congruence subgroups of Gg as natural generalizations 
of the corresponding concepts defined in the previous section. This done, then given 
k € Z and a vector valued mapping f : H" —> C4, where d = [J j,4,(«i +1), we 
define another mapping f||x,,7 of the same type by the following formula: 


(Fllen2)(z) = [[ N(as)*75 (2, 2) "σκ() 1 F(2(z)). 


+1 


154 


If [ is a congruence subgroup, then we denote by Ay, (Γ), Μὰ, (Ὁ), and δὲ, (Γ) 
the space of meromorphic, holomorphic, and cusp automorphic forms, respectively, 
with respect to I’. That is, we require f||,,,2 = f for allz ET. If B = Mo2(Q), 
then we also need to impose the usual condition at the cusps. As usual, the union 
of the spaces «ἐκ, κ([}) over all congruence subgroups [ is denoted simply by Ag,x. 
The symbols M,,, and S;,, are defined in the same way. 

Furthermore, if ¢ is a character of Γ of finite order, then Mx,x(T,¢) is the 
subspace of M,,, such that f||k,.2 = $(2)f for all « Ε I. Similarly we have 
the definition of δὲ, (Γ, φ). Notice that the distinction between Mx,x and Sx,x 18 
unnecessary if B is a division algebra, because in that case Mix = Sk,«- 

Next we turn our attention to the meaning of arithmeticity for automorphic 
forms. Let K be a totally imaginary quadratic extension of F.. Suppose ἢ is 
an F-linear embedding of K into B. Then we have h(Ix*) C Gg; and λἈ(1 7) 
has a unique common fixed point on ἢ". This point is called a CM-point. To 
define arithmeticity, we need to make use of a mapping P;,,, which takes values in 
GL(4), and is constructed by means of a bilinear mapping related to the periods 
of abelian varieties. Since we do not wish in this article to stress the technical 
details, we refer the reader to Shimura’s [Sh80] and [5811] for the definition of 
these mappings. An element f € Ax, is said to be arithmetic at a CM-point w (at 
which f is holomorphic) if P,,,(w)7'f(w) has algebraic components. If this is so 
for all such CM-points, then f is called an arithmetic automorphic form. We shall 
use the symbol Q to indicate the set of arithmetic elements. Therefore we speak of 
Si x(Q), δὲ, (Γ, Ὁ), δὲ, (Γ, >, Ὁ), and so forth. 

We have the following identities: 


Mex = M x. x(Q) ΘΟ C, and Skt = Sk (0) ὃ C. 


For the remainder of this section, we shall describe an algebraicity result concern- 
ing a zeta function. In order to do so, we first need to consider adelic automorphic 
forms, which are defined on Ga. Given an integral ideal ¢ of F and a character 
yy, we can define the space of automorphic forms and its subspace of cusp forms of 
weight (k, «), level c, and character ᾧ. These spaces will be denoted by Mx,«(¢, ψ) 
and δὲ, κίς, ψ). Elements of such spaces will denoted by bold face letters. In the 
interest of economy of space, we shall not develop the definition of such forms from 
scratch. It turns out that every element f € Mx,x(c,~) can be identified as a vector 
(fi,--- Fn) € The: Μὰ (Γλ, φλ) for suitably defined Γλ and ¢),,A = 1,... ,h. We 
refer to, say, [D93] for a full explanation. The meaning of arithmeticity for such 
adelic automorphic forms is a natural generalization of what we have just defined— 
iff € Mxz,x(c,w) is identified with (fi,... , fn), then f is called arithmetic if all ἢ 
are arithmetic for: = 1,... ,h. 

Assume that F has a subfield E such that [Ὁ : Q] =r and that the restrictions 
of v,,...,v, to E are all different from one another. Our zeta function involves 
three main ingredients: a locally constant function 7 on Εἰ, which is Q-valued, a 
Hilbert modular form 2 of Mjo(M2(£)), and a primitive Hecke eigenform 0 4 f € 
Sx.x(¢,~). The automorphic forms are taken among the arithmetic elements. This 
is possible because of the identities mentioned above concerning δὲ, and Sx,«(Q), 
etc. Recall, also, that a function ἢ is locally constant if there exist two lattices L 


155 


and M in F such that n(z) = 0 for « ¢ L and n(z) depends only on z modulo M. 
We can find a subgroup U of 0, where the symbol op stands for the ring of integers 
in F’, such that n(uz) = n(x) for all ὦ € U. We then say that ἡ is U-invariant. 
Let us take such a locally constant function 7 and a Hilbert modular form 2 as 
above, and write 
Q(z) = Yo w(a)ez(az), 


a€E 


where eg(az) oe exp(27i δ... a°iz;). Here we have denoted the restrictions of υ; 
to E by the same symbols for notational simplicity. Let Ug denote the group of 
totally positive units in Ε΄. We then take U to be a subgroup of Ug of finite index 
such that both ἢ and 2 are U-invariant, and introduce the notation [U] “i [σε : 
1. | 

As for the primitive Hecke eigenform f, suppose we have f | T,(a) = x(a)f for 
all a, where the J,(a) denote the Hecke operators. Then the set {χ(α)} is called a 
primitive system of eigenvalues. The zeta function Z(s) is defined by 


Ζ(5) -- [0] Σ᾽  n(b)x(br)wu(Treye(tb))b° 2”. 
O<bEF/U 


Here ἐξ Εἰ is such that t’ > 0 for1<i<randt"% <O0Oforr+1<i<n;r 
is a fractional ideal of F'; c = Σ᾿", citi, where the c; are integers or half-integers; 
bs® = (b%,... ,b’")?; and wy(a) = a%w(a), where u = uyv) +--+: + Unvn with 
0<u;€Z. 

We can now state a result of Shimura ([Sh81I]]) concerning the algebraicity of 
certain critical values of the zeta function Z(s) as follows: 


Theorem. Suppose | + 2u+ 2Rer/p(c) = y(v1 +++: +n) with y € Z, 2c; = ky 
(mod 2) fori < r, κι < 2c; = κι (mod 2) fori > στ. Then for every integer 
satisfying 

k; . , 
(1) (2) -14+1F: ΕἸ «90 « (2) τὸ 1{1 “τ Ξ-τ; and 
(11) 280 A7+2[F: ΕἸ, fE=Q, 
we have Z(so) finite, and 

Z(so) ~ w!*\(g,g). 


Here x ~ y means — € Q, and g is any arithmetic Hecke eigenform belonging to 
the system of eigenvalues {x}. 
Moreover, tf in addition we have 


Qo +k; +2=7+2[F: ΕἸ, Vl <i<r, 


then the residue of Z ats = (7): : ΕἸ is an algebraic number times ΠΕ πίε (8, 8), 
where Rr ts the regulator of E. 


The above theorem is certainly rather complicated technically; several more 
theorems of this nature can be stated in this setting, and can be found in [Sh81]]], 


156 


together with detailed proofs. Moreover, theorems of this type have been obtained 
by Shimura in much more general settings, rendering the theorem stated above 
only a special case. For example, we may replace F by a direct sum of totally 
real algebraic number fields. The function f(w,s), which we define in the next 
section, will be given in this generality. In any case, for a fuller discussion we 
refer to [Sh83] and (Sh88]. In addition to the intrinsic interest of such results, 
which is evident, there are also important consequences concerning the L-functions 
attached to automorphic forms—indeed, they may be regarded as special cases 
of the theorems of the above type. (The Theorem in §1 is a simple example of 
this.) Furthermore, these results are also intimately related to the several period 
invariants, also investigated by Shimura. Shimura’s article [Sh88] is one recent 
reference in which all these aspects are discussed in depth. Also in that paper, 
a list of far-reaching conjectures are stated on the precise nature of the periods 
of automorphic forms. Several of these conjectures have been settled by Shimura 
himself ({Sh90]), in the case where we have a division quaternion algebra. Parallel 
results for the non-division algebra case (i.e., the case of Hilbert modular forms) 
have been obtained by the author; see [D94]. M. Harris and H. Yoshida also made 
penetrating contributions in this direction. See especially Yoshida’s paper [Y985]. 

We are interested in a further generalization of the type of results treated in 
this section. Namely, we wish to investigate zeta functions of the above type in the 
setting involving a field extension. In particular, we wish to examine the case where 
the field extension is a totally real quadratic extension of a totally real algebraic 
number field. A theta lifting for automorphic forms with respect to such a field 
extension will be constructed in the final section of this paper. 


3. The function f(w,s) in the orthogonal case 


As we have explained, the Theorem in the previous section can be considered 
as a generalization of the Theorem we cited in §1. In this section we shall con- 
struct a mapping f(w,s), which can be regarded as a generalized version of the 
Eisenstein series we discussed in §1. Here the variable w belongs to a certain 
bounded symmetric domain and s is a complex variable. We shall see that f(w, s) 
is nearly holomorphic and arithmetic at certain critical points s. This does not 
merely round out the ideas about arithmeticity introduced in the first section, but, 
more importantly, the proof of the Theorem of §2 is aided by the consideration 
of such a function. (See (Sh81II] for the setting of §2 and [5188] for the general 
setting.) Moreover, we shall see that the mapping f(w,s) is defined via a theta 
correspondence, which leads naturally to the main topic of the next section. 

The main references for this section are Shimura’s [5886] and [Sh87], as well as 
the work of A. W. Bluher, [B90], [B94], and [B98]. We shall largely follow the 
notation in [5186] in this section. The setting of this section includes that of 82 as 
a special case, as we shall see. 

Let F denote a direct sum of totally real algebraic number fields F\,... , Fy. Since 
many of the concepts in this setting are to be defined as natural generalizations of 
corresponding ones in §2, we shall denote them by the same symbols. For example, 


157 


the symbol er(z) will now have the meaning 


er(z) =e ( δ᾽ .) ΞΞ 6 »" Ζασ | -Ξ exp | 2πὶ δ᾽ Ζσ . 
σευ» σεις  “(Σὴ σευ: "(δὴ 


The symbol J(F;) here means the set of Archimedean primes of F;. Such modi- 
fications in meaning will not be be explicitly mentioned, as so doing will increase 
the length of this article unnecessarily. Instead, we refer the reader to the papers 
referred to above for precise definitions, whenever there is a doubt. 

We assume that FE is a common subfield of Fi,... ,/;, and that Jr contains a 
subset € such that the restrictions of places of ε to E yields a bijection of € onto 
J(E), with eN J(Fi) Κα @ for all 1 <i <¢. Thus our E is analogous to the field in 
82 denoted by the same symbol. We write ε' = Jp -- ε. 

We now recall some general principles concerning the action of a certain orthog- 
onal group on a bounded symmetric domain. See [Sh80] and [B90] for details. 

Let n be a positive integer and V be a vector space over R of dimension n + 2, 
and let S be a symmetric bilinear form on V (over R) of signature (n,2). Then 
S can be naturally extended to a C-valued symmetric form on V @r C. Let us 
consider the set 


N(S) = {v €V @rC 510] = 0, S(v, 0) < 0}, 
where S[v] dl 5 (v,v). Denote by B the set of ordered bases (zx, y) of two-dimensional 
totally negative subspaces of the quadratic space (V, S) such that S[z] = S{y] = —1. 


Then we have a bijection B +> N(S)/R4, which is induced by (z,y) Ὁ) z+ ty. 
We define the block diagonal matrices | 


Q = diag[In, -- 14], R= diag 5 (2, ral 


where J, denotes the identity matrix of order k. Then a matrix A € GLn+2(C) (or, 
strictly speaking, a C-isomorphism V @p C -- C"*?) can be found, such that the 
following identities hold: 

S(x,y) ='(Ar)R(Ay), and S(z,y) =‘(Ar)Q(Ay), Var,y € V OrC. 
Therefore the mapping v ++ Av gives a C-linear isomorphism 


A: N(S)—>N(Q,R) ¥ {ue Ct? | tuRu =0, ‘Qu < 0}. 


The space V(Q,R) has two connected components, which may be described as 
follows. Let 


1 
(3) Zn, ={weEC" | wWw<1l+ qlee)? < 2}. 


158 


We define two mappings p: Zn > N(Q,R) and ρ' : Z, > N(Q,R) by 


ει τὺ 

_ Ww Ι _ 1 
p(w) = >a and p(w) = tow 

1 2 


Then p and p’ naturally give rise to bijective mappings of C x Z, onto the compo- 
nents of Λί(Ο, R) alluded to above. Indeed, we may simply define (c, w) > cp(w) 
and (c,w) ++ cp'(w), respectively. Clearly the two components are homeomorphic 
to each other. We denote these components by Mo(Q, R) and N4(Q, R), and the 
corresponding components of N(S) by No(S) and Ng(S). 


Summarizing, we have the following sequence of bijections: 

(4) 
“τα 
Zn, -Ἔγ No(Q, R)/C* 5. No(S)/C*% --- 
{two-dimensional totally negative subspaces of (V, S)}. 

We introduce here a majorizing form of S which will be needed in the next 
section. If w € Z,, then by (4) we see that w corresponds to a subspace W 
spanned by the real and imaginary parts of w! = A~!p(w) over R. Let W+ be 
the orthogonal complement of W. Then S is negative definite on W and positive 
definite on W+. Our (positive definite) majorizing form Py, is then defined by 
Ρ = —-S on W and Py = 5 on Wt. 


For notational clarity, we shall from now on write 
Plv; ὦ] = Py(v). 
“We then have the following formula: 
(5) Plv; ὦ] — 510] = —45(w', w’)"|S(v, w’)|? > 0. 


Returning to the preparation for the definition of f(w,s), let V; be a vector 
space of dimension n; + 2 over Εἰ for each 1 <i < t with n; > 0, and consider an 
F,-bilinear symmetric form $; : V; x V; —> Εἰ. We adopt the following notation: 


ve [[™M. G(Si) = {a € SL(V;) | Si(az, ax) = Si(z,2)}, 


S={S1,...,5:}, Ο(5) = [] (Si). 


If 7 € Jr, then τ € J(F;) for some i, and we denote by V; the T-completion of 
V; and by 5. the natural extension of 5: to Κα ὃ C. We also write n, = πὶ. We 
then have the identification 


V @gC= [| V, @rC. 
TEJPp 


159 


In the following, we assume that S, has signature (n,,2) on V; if τ € €, and 
that δ. has signature (τς + 2,0) on V,; if r Ε εἰ. 
We can define Z, as in (3) for each τ, and let 


Z= {|| Ζ.. 


red 


The symbol 6 in the definition above is defined as follows. Arrange indices in such 
a manner that we have n; > 0 for 1 <i<randn; =0forr+1<i1<t. We 
assume 0 <r <t. We then put 


S=eNJ(F, x-:-F,), and 6’ =e'NJ(F,\ x---x F,). 


Given 0 «κε Z®,, we define P.(S) to be the vector space of C-valued polynomial 
functions on ΓΤ, ες, V> which are homogeneous of degree αν and S;-harmonic on 
V,. (See [Sh80] for the definition of S,-harmonic functions.) We now take Q(z) = 
> v(a)er(az) to be an element of Mjo(E) with 0 < 1. (Therefore 2 is allowed 
to be a constant.) Let Ug and U be defined in the same way as in the previous 
section. Then we define | 


f(w,s)=(U} δ᾽ δ()ω(τερ, ε(-- 5[0]}} 510] ᾳ(υ)υϑ 


οτένεν 
lv, w)*n(w)*"|u[v, υἹδυ ἢ". 


Here c € L(V) is a locally constant function, and q € P,(S). Also, O<k€ Ζ5.0 « 
KE Z° 0 <jE€Z? dE Z*, and w is a subset of Jy: such that 


Resy,jp,(y A J(V;)) =eNI(R), Vier41,... yt, 


where Iv Ἐ' μι Χο x Vy. Finally, pw is a factor of automorphy defined by 


u(r, w] = S,(a,, Ap" p(wr)) = (Are) Rp(wr), re II ΜΕ Z. 


We have already explained that consideration of mappings of type f(w,s) is 
indispensable for algebraicity results such as the Theorem in the previous section, 
which, in turn, yields further results in the study of arithmetic. Here we note that 
the setting of the previous section corresponds to the case t =r = 1 here. 

We now consider the question of near holomorphy and arithmeticity. The mean- 
ing of near holomorphy and arithmeticity are natural generalizations of the defini- 
tions given in the previous sections. The precise definitions can be found in [Sh86]. 
(See also [B90] and [B98].) Let us state a result of Shimura ((Sh86]) as follows. 


Theorem. The series f(w,s) can be continued as a meromorphic function to the 
whole s-plane; more precisely, there exist a non-zero entire function A(s) and a real 
analytic function B(w,s) on Z x C, which 16 holomorphic in s, such that 


A(s)f(w,s) = B(w,s) 


160 


for large Re(s). Furthermore, under certain conditions, f(w,s) is finite, nearly 
holomorphic, and arithmetic. This 1s more explicitly explained as follows. We 
assume that 

(1) k; > O for every7t €6 and φ. >0 foro gwpUryp; 

(τ Resp p(k — καὶ — 27) — Κεθκ, Ε(Φ) —1 = agep, where ag € 2-*Z; and 

(11t) be => hop, where p 15 the complez conjugation. 

Then we put | 


t 
by = art 2 -- Σ αι, + 2)[F; : ΕἸ12. 
w=] 

If by > 1, then f(w,0) is finite; in fact it is holomorphic in w with a few 
exceptions which can be explicitly identified. Furthermore, with some additional 
conditions, r* pi (—¢, 2) f(w,o) is arithmetic, where the power * can be explicitly 
given, and px 1s defined in [Sh80] as we alluded to in 82. 

If by = 0 or 1/2, then f(w,s) has at most a simple pole ats =1— by. Leth be 
the residue there. Then, under some additional conditions, Rz'n* px (—¢, 2p)h 18. 
arithmetic, where the power * can be explicitly computed, though not always equal 
to the power above. 


The conditions omitted above can be found in [Sh86]. The proof can be found 
in [Sh88]. Shimura’s techniques have been extended by A. W. Bluher to show that 
such results hold for a certain range of other integers and half integers as well. 
Moreover, by imposing somewhat more stringent conditions on the locally constant 
function c as well as the Hilbert modular form 2, Bluher showed that similar results 
hold for an even larger range of critical points for another series f(w,s), which is 
defined as a product of f(w,s) with a certain L-series. These results can be found 
in the papers [B90] and [B94]. 

Another feature of the theory we wish to mention here is the fact that f(w,s) 
has an integral representation of the type 


f(w,s) = [2G)0G, wi ο)ν" Be, saz}, 


where E(z,s) is an Eisenstein series and O(z,w;C) is a certain theta function. 
Therefore, we may say that f(w,s) is defined via a theta correspondence. In this 
regard, Bluher has proved that to a certain extent, arithmeticity is preserved by 
theta correspondence. (See [B98].) This result is relevant in the discussion of the 
arithmeticity of fw, s) above. Although we do not wish to discuss this particular 
result in detail, the idea of theta correspondence will be our focus in the next 
section. 


4. Theta correspondence with respect to a quadratic extension 


In this final section of the paper, we shall present a theta correspondence of 
automorphic forms with respect to a totally real quadratic extension of a totally 
real algebraic number field. Thus our setting is again a generalization of that of §2, 
but in a different direction from that of §3. In this setting, one may again hope to 
derive results analogous to those stated in §2, and to do so is the main focus of the 


161 


author’s current research. Rather than entering into a discussion on the generalities, 
we shall devote most of this section to a description of the correspondence in our 
specific setting, together with some of the properties. 

Let us explain our new setting. Happily, most of the ground work has already 
been laid in the last two sections, and consequently we can afford to be quite 
efficient. Let F be a totally real algebraic number field, and E/F a totally real 
quadratic extension.» Let Br be a quaternion algebra over E which is equipped 
with an F-linear automorphism 7 such that τῷ = idg,, but τε #4 idg. The main 
involution of Bg will be denoted by *. 

We shall need to consider the following subsets of Bre: 


Β-{(: ΕΒ. =x} and V={2e Be| xr’ = —2"*}. 


It can easily be shown that B is a quaternion algebra over F’, and V is a vector 
space over F’ of dimension 4. 
Regarding the adele rings, we fix notation by setting 


Boo = M2(R)'x H® = and (Βε)ω & M2(R)S x HY. 


Let a be the set of Archimedean primes of Ε΄, For each v € a, we fix, once and 
for all, an extension ὦ € J(£). The collection of these primes is then written ἐ. 
Further, we denote by ἢ and η΄, respectively, the subsets of « corresponding to ὃ 
and 6’. To summarize our notation, we have the following identities: 


δι Ξε δι1 δ΄, and J(E)=CUC'; 
J(£)=cuUre, and ἐξ ηἰ!η’; 
C=nluty, and ¢ =n! Urn’. 


The meaning of the notation Tv, etc., is self-evident. Throughout this paper, we 
shall assume (as in an analogous situation in 82) that ¢ 4 0. 

We now define holomorphic automorphic forms on ἢ“. This time they are matrix 
valued mappings taking values in End(’), where 4 is defined in the same manner 
as the representation space in §2 denoted by the same symbol, and have weight 
k-++rk, where k € Ζ'. 

Given a mapping f : HS — End(4’), we define another mapping of the same 
kind, fllnae.a, by the following formula: 


(flle+ree)(w) = 7(α, w)*™-7 o(N(a) 8 a7) f(aw)o(N(a)” Fa"). 
We may then define S,474(T) and Sy474(Be) by generalizing the definition in 82 
in a natural way. The detailed definitions can be found in [0974]. 


To define our theta function, we take 2N, where N is the norm, to be the 
symmetric F-bilinear form S (on V) in 83. Thus we have | 


S(z,y) =Tr(zy"), δδναζ,ν eV. 


3Note that E is no longer a subfield of F! 


162 


In this case, the majorizing form of S, for v € 6 can be explicitly computed. It 
turns out that we have 


P,[€; w] = 2N(€) + Im(wi)7*Im(we)7"|[€; wi, we]]’, 
γυε 6,VWE € Vy, Vw = (w1,w2) Η x HH. 


We note that here the discussion in 82 is relevant, since the computations can be 
essentially reduced to that case. 

If C € L(V) is a locally constant function and an element r € F is chosen such 
that r, > 0 forvédé andr, «0 for v € 6’, then the theta function is defined to be 
an End(A’)-valued function 6(z,w;C,r) on H® x Ης“ as follows: 


(6) O(z,w;C,r) = Im(z)?Im(w)7*7~ 7k) 
> CELE, WI)" o(Eer (TRIE, z,u)). 
EEV 
Here 


N(Eu)Z0; ifulp =ve δ΄, 


RE, z,wly = 
Geile { eye tale) PUm(o kale), tule awe δ 


In order to give the theta lift and its properties explicitly, which is crucial for 
algebraicity results, we must also derive explicit transformation formulas for the 
theta function. 

Let us first consider the action of (Bz)% on the variable ὦ. If a € (B ΕἸΣ and 
N(a) € F, then 

O(z,w;Cyr)||k¢r,a = O(z,w;3Ca,r), 
where Ca(€) qe C(afa~"). Also, if p€ F™, then 
6(z,w;C,r) = 0(z,w;C',p’r), 
7 def κ᾿ η! 
where Ο΄(ξ) = p27" C(pé). 

We now consider the action of elements of SL2(F’) on the variable z. In this 
case we can specialize certain results of Shimura to our setting. See [Sh93] for a 
discussion in the general setting. Let SLo(F) be the G, and the form 2r'N be the 
S, in §3 of [Sh93]. (Notice that, therefore, the usage of the symbol S differs in 
this paragraph from the rest of the paper.) Thus S is of signature (2,2) for v € ὁ 
and signature (0,4) for v € 6’. Then the q loc. cit. equals 4. We then have 
F(det(S )2) = FE and a straightforward calculation shows that 


J>(a,z) = [7(α, 2)2 a (a, 2)", 


where J°(a,z) is Shimura’s notation, except that here we have written z in place 
of Shimura’s 3. We further note the following notational correspondences (in our 
specific setting), writing the notations of Shimura in [Sh93] on the right-hand side: 


Ga=M; L(V) =S(Vi"); 


163 


CELIV) HAE S(Y); Ὁ 4+ °X for y =o € Ga. 


In such a case we have the following results. Notations as above, every 7 € Ga 
gives rise to a C-linear automorphism of L(V), (this action depends on the choice 
of r!) which we denote by (7, C) ++ yC, such that the following properties hold: 


(a) 7(γ, 2) *O(yz, w;yC,r) = O(z,w;C,r), VyEG; 
(b) (y¥d)C =7(6C), νγ,δ € Ga. 


(c) For every C, there exists a congruence subgroup [ of G, such that 


yC=C, Vy er. 
(d) If (2) € Gy and 62) = 0, then 


((2)C)(x) = laa) [awn (aay en (7 N(x)a(2b(2))C (xa), 


where ὦ denotes the Hecke character of F corresponding to FE, and ep (y) def ea (Yh). 


We also have, for p € F', p > 0, 
p? (pz, w;C,r) = 6(z, w; p** 3C, pr). 
We now define the theta lifting as follows. Given ἢ € 5;447x%(Be), we define 
I(z;C,r;h) = (0(z,w;C,r), h(w)). 
Theorem. We have 


I(z;C,r;h) = υοϊ( Ὁ). } [ ΤΥ (θίτω;Ο, r)h(w)) Im(w)*9t7 kD dow, 
D 


where D = A\H§ and A is a suitable congruence subgroup. I(z;C,r;h) ts a Hilbert 
modular form belonging to S;(SL2(F)), and its Fourier coefficients are related to 
the periods of ἢ. More precisely, we have an expression of the form 


vol(D)I(z;C,rjh) = κ᾿ alll N(a)|~° C (a) P(h, a, A)er(—r N(a)z), | 
where P(h,a, A) is a certain period of h, which ts defined by 


P(h,a, A) = [ -" [a, pw)*” Tr (‘o(a)h(pw)) d(Aap). 


a 


Here a is a totally negative element of V, and w € H§ can be arbitrarily chosen. 
In fact, the above integral is independent of the choice of w, and vanishes for alla 
which are not totally negative. Finally the symbols Ag and Jy are suitably chosen 
subgroups of SL2(R)S. 


For a proof of the above theorem, see [D97a]. This theorem generalizes a theorem 
of Shimura (see [Sh82]). Its significance is partly due to the relation to the period 


164 


conjectures of Shimura; for a discussion of that connection we refer the reader to 
[588]. On a more concrete level, we see that several of the key concepts discussed 
in §2 have already made an appearance in the Theorem above. In any case, this 
clearly provides a setting in which suitable generalizations of theorems of the type 
mentioned in §§2-3 can be sought. In fact, if we retrace our development through 
those two sections, we may to a certain extent anticipate such results. Therefore, the 
consideration of the setting of this section is natural. Also, in view of Yoshida’s work 
(see[Y95]), our setting is relevant. Without engaging ourselves in any speculations, 
we simply mention that the author has obtained several results in this direction, 
which can be found in the papers [D97a] and [0970]. A third work is currently 
under preparation, which he hopes to complete in the near future. Since these 
results require a considerable amount of further technical preparation, their precise 
statements are omitted here. 7 


REFERENCES 


{B90} A. W. Bluher, Near Holomorphy of some automorphic forms at critical points, Invent. 
Math. 102 (1990), 335-376. 


[B94] , Arithmeticity of some automorphic forms at critical points, Amer. J. Math. 
116 (1994), 1283-1335. 
[Β98] , Near holomorphy, arithmeticity, and the theta correspondence, To appear in 


[DDG98}. 

[DDG98] R. Doran, Z.-L. Dou, G. Gilbert, eds., Proceedings of Symposia in Pure Mathematics, 
to appear 1998. 

[D93] Z.-L. Dou, Fundamental periods of certain arithmetic cusp forms, Thesis, Princeton 
University (1993). 


[094] , On the fundamental periods of Hilbert modular forms, Trans. Amer. Math. 
Soc. 346 (1994), 147-158. 

[D97a] , On a theta correspondence with respect to a quadratic extension, Preprint. 

[0970] , Theta correspondence and Hecke operators relative to a quadratic extension, 
Preprint. 

[Sh76] G. Shimura, The special values of the zeta functions associated with cusp forms, Comm. 
Pure and Appl. Math. 29 (1976), 783-804. 

[Sh80] , The arithmetic of certain zeta functions and automorphic forms on orthogonal 
groups, ‘Ann. Math. 111 (1980), 313-375. 

[5811 , On certain zeta functions attached to two Hilbert modular forms, I, Ann. Math. 
114 (1981), 127-164. 

(Sh811T] , On certain zeta functions attached to two Hilbert modular forms, 11, Ann. 
Math. 114 (1981), 569-607. 

[582] , The periods of certain automorphic forms of arithmetic type, J. Fac. Sci. Univ. 
Tokyo (Sect. IA) 28 (1982), 605-632. 

[585] , Algebraic relations between critical values of zeta functions and inner products, 
Amer. J. Math. 104 (1983), 253-285. 

[Shs6] , On a class of nearly holomorphic automorphic forms, Ann. of Math. 123 
(1986), 347-406. 

[587] , Nearly holomorphic functions on Hermitian symmetric spaces, Math. Ann. 
278 (1987), 1-28. 

[Sh88] , On the critical values of certain Dirichlet series and the periods of automorphic 
forms, Invent.. Math. 94 (1988), 245-305. 

[590] , On the fundamental periods of automorphic forms of arithmetic type, Invent. 
Math. 102 (1990), 399-428. 

[593] , On the transformation formulas of theta series, Amer. J. Math. 115 (1993), 


1011- 1052. 


165 


[5195] , Arithmeticity of the special values of various zeta functions and the periods of 
abelian integrals, Sugaku Expositions 8 (1995), 17-38. . 
[597] , Euler products and Eisenstein series, CBMS Regional Conference Series in 


Mathematics 93 (1997). 
[Y95] H. Yoshida, On a conjecture of Shimura concerning periods of Hilbert modular forms, 
Amer. J. Math. 117 (1995), 1019-1038. 


MATHEMATICS DEPARTMENT, P. O. Box 298900, TEXAS CHRISTIAN UNIVERSITY, FORT 
WorTH, TX 76129 
E-mail address: z.dou@tcu.edu 


MORPHIC HEIGHTS AND PERIODIC POINTS 


M. EINSIEDLER, G. EVEREST, AND T. WARD 
June 18, 2000 


ABSTRACT. An approach to the calculation of local canonical mor- 
phic heights is described, motivated by the analogy between the 
classical height in Diophantine geometry and entropy in algebraic 
dynamics. We consider cases where the local morphic height is 
expressed as an integral average of the logarithmic distance to the 
closure of the periodic points of the underlying morphism. The 
results may be thought of as a kind of morphic Jensen formula. 


1. INTRODUCTION 


Let ¢ : P!(Q) — P!(Q) be a morphism of degree d, defined over the 
rationals. Call, Goldstine and Silverman (see [3],[4]) have associated to 
o a canonical global morphic height Ag on Q, with the properties that 


1. λφί(φ()) = dAg(q) for any q € P'(Q); 

2. q is pre-periodic if and only if Ag(q) = 0. 
A point g is called pre-periodic under ¢ if the orbit {¢™(q)} is finite 
(write f™ for the nth iterate of a map f). The global height decom- 
poses into a sum of local canonical morphic heights Agu: 


λφί(ᾳ) = Σ, γυυλφ,υ(4). 


Here v runs over all the valuations (both finite and infinite) of the num- 
ber field generated by ῳ and the n, denote the usual normalising con- 
stants. In the special case that ¢ takes the form ¢[z, y] = [y*f(z/y), ν΄} 
for a polynomial f of degree d, Call and Goldstine [3] prove that 


dao(a) = lim, = A0(6™(Q)), (1) 


where A, is the local projective height A,(q) = log* |q|,, and that the 
local height λφιυ(ᾳ) vanishes if and only if |¢'")(q)|, is bounded for all 


1991 Mathematics Subject Classification. 11805, 37F10. 

The first author acknowledges the support of EPSRC postdoctoral award 
GR/M49588, the second thanks Jonathan Lubin and Joe Silverman for the AMS 
Sectional meeting on Arithmetic Dynamics at Providence, RI, 1999. 


167 
D. Chudnovsky et al. (eds.), Number Theory 


© Springer-Verlag New York, Inc. 2004 
OU OU 


168 


n, and finally, that q is pre-periodic if and only if 
λφ,υ(ᾳ) = 0 for all v. (2) 


Example 1.1. 1. Let f(z) = χὰ with ἃ > 1. Here the canonical 
morphic heights and the projective heights agree. Jensen’s for- 
mula ({13, Theorem 15.18]) gives 


[fos ly -- aldm(y) = tog* lal, 


where m is the Haar measure on the circle S!. The circle is also 
the Julia set for this morphism on C, and it is the closure of the 
set of non-zero periodic points, which are all roots of unity. In 
the p-adic case, the Julia set is empty but it is still true that the 
local height is the Shnirel’man integral of the logarithmic distance 
from the closure of the set of periodic points. For all v, finite and 
infinite, the following holds: 


1 
log* [4] = tim log IC -- dle. 
ζτέᾳ: ζ΄" 1--} 
For finite v this may be seen, for instance, using Section 3. Alter- 
natively, it follows from the Diophantine estimate 


lq” — α]0 > C(q)|n|. 


provided the left hand side is non-zero (cf. Remark 3.6). For v|oo 

a result from transcendence theory (in this case, Baker’s theorem) 

is needed. 
2. Suppose a,b € Q with 4a? + 27b? ~ 0 and let 

24 — Qaz* — 8bz + a? 
I(2) = z+azt+b 

Then f gives rise to a morphism of degree 4 which describes the 
duplication map on an elliptic curve. The global and local morphic 
heights coincide with the usual notions of height on the curve. 
For the infinite valuations, the local height is again the integral 
average of the logarithmic distance to points on the Julia set, 
which is the closure of the set of periodic points. At the singular 
reduction primes it is still true that the local height is the integral 
average of the logarithmic distance to the periodic points. Both 
of these assertions are proved in [6] where this morphism was used 
to construct a dynamical system which interprets these heights in 
dynamical terms. The proofs require elliptic transcendence theory 
to show that a rational point cannot approximate a periodic point 
too closely (cf. Proposition 3.8). This is part of a much broader 


analogy between heights in Diophantine geometry and entropy in 
algebraic dynamics (see [5], [6], [7], [10]). 


In this paper, our purpose is to describe a family of examples where 
the local canonical morphic height can be expressed as a limiting inte- 
gral over periodic points of the underlying morphism. The finite and 
infinite cases require different approaches. In both, we consider the spe- 
cial class of morphisms corresponding to affine polynomial maps and 
in the former case, we assume good reduction in the sense of Morton 
and Silverman [12]. 

There are two directions in which this work can be made more so- 


phisticated that are not pursued here. The first is to give a more formal . 


interpretation of the limiting process using Shnirel’man integrals (see 
[14], or [8] for a modern treatment); the second is to extend the argu- 
ments to other morphisms. 


2. COMPLEX CASE 


Assume that ¢: P?(C) — P!(C) is a morphism of degree d, with 
dlz,y] = [y*f(x/y), y*| for a polynomial f of degree d. For basic def- 
initions of complex dynamics, consult [1]. The following theorem ex- 
presses the local morphic height as an integral over the Julia set J(f) 
of the polynomial f, and standard results from complex dynamics show 
that this is in turn a limiting integral over periodic points. 


Theorem 2.1. Jf f(z) = az4+---+ a is a polynomial, then for any 
φες, 


1 
λφρο(4) = gaz low lal + J, tog x -- aldm(a), (3) 


where m is the maximal invariant measure for f on J(f). 


Proof. Assume first that q is in the domain of attraction of oo for 
f. The zeros of the polynomial f,(x) = f(x) — x are precisely the 
solutions of the equation f(z) = x. Note that d, = deg(f,) = da”, 
where d = deg(f). Since | f™(q)| — οὐ, 7, log | fn(q)| is approximately 
=x log | f(q]|, which converges to λῳιοοί4ᾳ). Since g lies in the open 
Fatou set, log [2 — ᾳ] is continuous on J(f). Now 


1 1 1 
7 log | fn(q)| = 7 Σ᾽ log |x — q|+ i. log |B, |, (4) 
n 7 0) (2)=a " 


where the sum is over the nth ‘division points’ and 


B. = git ttden 
" ΞΞ 


169 


170 


is the leading coefficient of f(a). Thus 


1 (α --Ἰ 1 
= 10g |B = an (Fa) log |a| > 7] os lal: (5) 
Now it is known that 
1 
--- log |x -- + | log |x — gidm(z), 
ἄς sedi, 8 [ἡ -- 7 g [2 — g|dm(z) 


where m is the maximal invariant measure for f restricted to the Julia 
set (see [11]; [9]). 

In the following it is convenient to assume that a=1, we can ensure 
this by conjugating by a linear map. Now |x — f(q)| =  Πρω-- |t — 4], 
SO 


| 


[0 εἰς -- F@lam(e) -- [0 ΣΣ log lt -- alam(a) 


( κῶξο 
- a | log [ἡ — gldm(z). 6 
Γι logke —aldm(x). (6) 


(The last equality follows from [11] or [9, Theorem (d)]). 

Let now q ¢ J(f) have bounded orbit. Since J(f) is closed, we can 
find « > 0 such that B.(g) N J(f) = @. If |f"(¢) — ¢g| > €/2 for almost 
all n, then 

1 
7 los lf"(a) — al = 0. 


We can argue as in the first case to get 
log |x — ᾳὶ = 0 = Ag.~(q). 
"εἰς -- α 9.00) 


Assume now |f™(q) — q| < ¢/2 for some sequence nj; — oo. Then 
lf (q) —2| > €/2 for x € J(f). However, since J(f) and f™(q) are 
bounded, we have also an upper bound 


€ 
logs < { log ja — f™(q)| < M. 
85 Ξ Jay g |x — f™(q)| 
Together with Equation (6) we get 
Ε 
1/d™ log = < | logix—q| 51,4 Μ 
/d™ log 5 10) g|z—gq| “1, 


which concludes the proof in this case. 

It remains to show that the formula holds for g € J(f). Since J(f) 
has no interior, there is a sequence gq, — q with q, ¢ J(f). Then 
log |x — qn| — log |x — q| for all x € J(f)\{q}. Since J(f) is bounded, 


log |x — g,| and log |x — 4| are uniformly bounded above by M say for 
σε J(f)\{q}. So by Fatou’s lemma 
O= lim log |x — gn|dm(x </ log |x — qi\dm(z) < M. 
ΓΟ Ἰοε]ς —auldm(z) < fog je -- gldm(z) 


TU— CO 


(7) 


This shows that x +> log |x — q| is in L'(m). If 
log |x — q\dm(z) > 0, 
I p 08 2 -- adm) 
then Equation (6) contradicts (7). O 


3. ‘THE p-ADIC CASE 


Let C, denote the usual completion of the algebraic closure of the 
p-adic numbers Q,, and use | -| to denote the extension of the p- 
adic norm to C,. Write ©, for the ring of integral elements in G,, 
and P for the maximal ideal of ὦ. In this section we assume that 
¢:P'(C,) — P'(C,) is a morphism of degree d corresponding to an 
affine polynomial f of degree d with coefficients in O, and leading 
coefficient in OF. Notice that these assumptions are, for polynomials, 
equivalent to the assumption that the map ᾧ has good reduction in the 
sense of [12]: ¢ induces a morphism of schemes over Spec(O,). The 
Julia set is empty in this setting (see [2], [12]), so a direct analogue of 
(3) is not possible. 

The main result expresses the local canonical morphic height as a 
limiting integral over periodic points for the polynomial f. 


Theorem 3.1. Jf ¢ has good reduction and is defined by a polynomial 
f of degree d, then 


. dl 
λφ, (4) = log" [4] -- Jims Σ, log |E — αὶ 
E#a:f(M(E)=E 
where the sum is taken with multiplicities. 


Notice first that for g € C,\O, this is clear, so from now on we 
assume that q € O,. Despite the simple resulting value of the height, 
the convergence involved requires an argument. The main issue is to 
produce lower bounds on the size of |€ — ¢| for distinct periodic points 
€ and ¢. 

The least period of a periodic point € is the cardinality of the orbit of 
€. The points of period n are the solutions to the polynomial equation 


75) -- α =0, (8) 


and are therefore all elements οἵ O,. Following [12], for a periodic 
point € define a,(€) to be the multiplicity of € in (8), with the obvious 


171 


172 


convention that € has multiplicity zero in an equation that it does not 
satisfy. Notice that a,(€) 4 0 if and only if n is a multiple of the least 
period of ξ. Define a*(€) by 


a3 (6) = Du (5) anlé), 


d|n 


where μ is the Mobius function. Increases in the multiplicity of the 
periodic point € along the sequence of multiples of its least period are 
recorded by a*(£). The periodic point € is an essential n-periodic point 
if a*(€) > 0. 

The following proposition is a special case of [12, Prop. 3.2]. 


Proposition 3.2. Let K be an algebraically closed field of characteris- 
tic p > 0, and f a polynomial over K with degree d > 2. Fix a periodic 
point C € Καὶ with least period m, and let r denote the multiplicative 
order of (f™)(C) in K* or co if (f(™) is not a root of unity. Then 
forn > 1, a*(C) > 1 af and only if one of the following conditions hold. 

l.n=m; 

2. Ὧν ΞΕ ΤῊΥ; 

3. p>O0andn=p*mr for some e> 1. 


When K = G,, this proposition will also be applied to the poly- 
nomial f induced by reduction mod P. Notice that the sum of the 
multiplicities of the points of period n under f lying in one residue 
class gives the multiplicity of the image point as a point of period n 
under f. 

For the proof of Theorem 3.1 the following proposition is needed; 
this will be proved later. 


Proposition 3.3. Suppose £, is a periodic point with least period n 
for a polynomial f of good reduction. Then for any fixed q € Oy, 
lq—&,| - 1 asn -- oo. 


Proof. (of Theorem 3.1) By Proposition 3.2 applied to the field C, with 
characteristic zero, if € is a periodic point with least period m, then the 
multiplicity of € viewed as a periodic point of period m,2m,3m,... is 
uniformly bounded. Fix gq € ὦ, and s Ε (0,1). Proposition 3.3 says 
that the number of periodic points in the metric ball D,(q) is finite. It 
follows that 


_ 2 1 
lim inf τς δ᾽ Ἰορ!ξ — g| > logs. 
ξγέᾳ: [{π) (E)=E 


On the other hand, each term in the sum is non-positive, so letting 
s — 1 proves the theorem. = 


All that remains is to prove Proposition 3.3, for which we need some 
lemmas. 


Lemma 3.4. Assume that f(O) = 0, and let ¢ be a periodic point of 
f. Then [(] = |f'()| for alln > 1. 


Proof. The spherical metric used in [12] coincides with the usual metric 
in O,, and f has good reduction. So by [12, Prop. 5.2], 


f(z) -- FMI < |e ---ὖἱ 
for x,y € Oy. The lemma follows at once. Ol 
Lemma 3.5. Assume that f(0) =0 andn > 1 is fired. 
1. If |f’(O)| «1 then Tez |€| = 1. 


! _ at 1 ifn 18 a power of p, 
2. If|f'(0)—1| < p™* then Tego [ξ| 556) =| Ἢ ἜΝ ἥ Ip 


Proof. In case 1., f’(0) = Ο in the algebraically closed field O,/P, so 
Proposition 3.2 may be applied with ¢ =0+ Ῥ, πὰ τὸ 1 and r= oo. It 
follows that for n > 1 a*(0+ P) = 0, so there cannot be an essential 
n-periodic point € for f with [ξ] < 1. 
In case 2., m = 1 and r = 1 for the point ¢ = 0+ P in Proposition 
3.2. It follows that only values of n of the form p* are relevant. Notice 
[I lel = 


that 
fOr(a) = a 
E£0 Fao " =). 
If f’(0) #1, then the right-hand side of (9) is given by 
(f’(0)P" = 1 = + 
(f'(0))P"" — 1] p 


by the binomial theorem. If f’(0) = 1, write f(z) = x+ x°g(x) with 
e > 1 and g(0) £0, then a simple induction argument shows that 


f(x) = 24 κα σ(α)- Ο(ω" ἢ. 
It follows that (9) is equal to p~! again. ΓῚ 


(9) 


Proof. (of Proposition 3.3) Let ¢ be any periodic point, with least pe- 
riod £. The first step is to prove the proposition for g = ¢. Let & 
have least period n under f. The multiplicity of ¢ + P, which has 
least period m for some m|£ must increase at @ (because aj(C) > 1). It 
follows by Proposition 3.2 that 2 is equal to m, mr, or mrp® for some 
e> 1. Assume first that n is not of one of those forms; then | —¢| = 1 
because the multiplicity of ¢ + P cannot increase at n in O,/P. 


173 


174 


In the remaining cases, we may assume for large n that é|n. Then 
€ is a periodic point with least period π᾿ δ under f. Applying the 
conjugation x +> x — ¢ means that 0 is a fixed point of g, where 
g(x) =  [Θα τῷ - τ 

If [σ΄(ΟἹ] < 1, then by Lemma 3.5 applied to g, [ζ —¢| ΞΞ 1. 

If |g'(0)| = 1, let ἐ be the order of g’(0) + P in O,/P. Then 


[(σ΄(0))" -- 1] <1. 
There exists a c > 1 such that 


I(9'(0))” — 1] < 1/p. 
As before, we may assume that tp°é|n, so Lemma 3.5 may be applied 
to the map h = g“*) and the periodic point € —¢ of least period n/tp°é 


to give 7 | 
Π ιμῦ)ςς --ΟΥ = τ)». 
j=.,....n/tpce 
It follows by Lemma 3.4 that [8 — ¢| > p7'°"/". Since t,c, ὁ depend 
only on ¢, [ξ —¢| — 1 asn — oo. The ultrametric inequality in C, 
now gives the result for any q € O,. [] 


Remark 3.6. Notice that the discussion above also gives a quanti- 
tative version of Proposition 3.3. This Diophantine result may be of 
independent interest. If f is a polynomial of good reduction, then 


f(g) -- αἰ > Ο(, φ πὶ 


for all n > 1, provided the left hand side is non-zero. 


Example 3.7. To see the different cases that are possible in Proposi- 
tion 3.3, consider the following examples. 


1. Let f(x) = g(x?) + ph(x) be a monic polynomial with coefficients 
in Oy. Then |f’(q)| < 1 for any g € O,, so in Lemma 3.5 only the 
first case is ever used. It follows that in Proposition 3.3, |¢—€| = 1 
for any distinct periodic points C, €. 

2. Let f(x) = τ —(1+a)z for some small a. Then 0 is a fixed point, 
and | f’(0)| = 1. Now 


f(r) — 2 = 24 — (2a + 2). + (a? + a)x? 4 (a? + 2a)z, 


so (f@(x)—2x)/(f(r) —x) has constant term (a? +2a)/(—a—2) = 
—a. Therefore there must be two non-zero points of period 2 that 
are close to the fixed point 0. 


If the polynomial f has coefficients outside O,, then in contrast to 
Proposition 3.3, there may be sequences of periodic points converg- 
ing to a periodic point. For example, f(z) = x? + $x on Cy has this 


property. Therefore, to recover Theorem 3.1 in greater generality (for 
polynomials of bad reduction or rational functions) some kind of Dio- 
phantine approximation results are needed. In Example 1.1.2 these 
tools are provided by elliptic transcendence theory. 


Proposition 3.8. Let ζ be a periodic point with least period € under 
f. Assume that |(f)(C)| > 1. Then there are periodic points € # ¢ 
arbitrarily close to ¢. 


Proof. Define g = f and a = (f)(¢). Without loss of generality 
we can assume that ¢ = 0. Then 


g(x) = ax + δα" + O(x°*") with b 40 


and 

g(x) = a?x + (ab + a®b)x® + O(x°*), 
Define b = (ab+ α΄), then [84] = [6[||α[6. By induction one can see 
that 


g(x) = aka + byx® + O(2°*?) 
with |b,| = |b] lal. 
Therefore the Newton polygon of g*)(x) — x starts with a line with 
slope s < —k +c for a fixed c (depending on b and e). So there exists 
a periodic point € with |€] = p* < ρ ὅτ, O 


4. 'T';CHEBYCHEFF POLYNOMIALS 


Example 4.1. Consider the Tchebycheff polynomial of degree d, f(z) = 
Ta(z) = cos(darccos(z)). The Julia set is the interval J(f) = [—1, 1]. 
The map ᾧ : C > C given by ¢(z) = $(z+ 27’) is a semi-conjugacy 
from g: z+» χὰ onto z+ f(z), in other words, f(¢(z)) = φ( 4). Write 
w for the branch of the inverse of @ defined on {z € C | |z| > 1}. The 
canonical morphic height at the infinite place is (for q ¢ J(f)) 


4 ; 
λφιοοί4) = lim, = log* |f™ (4) 


4 , 
= lim, = log* ἰφοῦ b(q)| 
1 


1 1 
= lim —logt |= [{g™ +a] 
Jim, a, los” [5 [ ψίᾳ) og) | 


1 
- dim max {0. π Ιορ φῦ ψ(ῳ}}} 


= log* [ψ(4)}. 


For q € J(f), the same formula holds since λφιοοίᾳ) = 0 there by [3] 
and log* |7(q)| = 0 there by a direct calculation. 


175 


176 


Now by Jensen’s formula, for any g € C, 
log* |w(a)| = log2+ [ἸΦ() -- aldy 
= log2+ [log |t -- aldmit 
g2+ [᾿ς log|t— adm) 


since m is the image under ¢ of the maximal measure (Lebesgue) on 
the circle. That is, 


λφ,οο(4) = log 2 + [ posit = alam). (10) 


The constant log2 in A..(q) may be explained in accordance with 
Theorem 2.1. The leading coefficient of Ty is 24-1, so ++ log |a| in this 
case is exactly log 2. 


A similar approach can be adopted in the case of polynomials with 
connected Julia sets. There the local conjugacy near oo extends to the 
whole domain of attraction of co, which is the complement of the filled 
Julia set. 


Example 4.2. As before, let f(x) = Ty(x) = cos(darccos(x)) be the 
Tchebycheff polynomial of degree d and let ¢ be the corresponding 
morphism. We would like to use Theorem 3.1, but f does not satisfy 
the assumptions since it is not monic. 

Let g(x) = 2f(§). Notice that f is defined uniquely by the property 
fGG@+2"))= 5(z4 + 27%). It follows that g is characterized by the 
property g(z+27!) = (z4+27%), which shows that g € Ζ[1] is a monic 
polynomial. Let w= be the morphism defined by g, then by Theorem 3.1 
we have for g € C, 


1 
λῳ, (4) = log™ lal = Jim Σ, ἰἸορξ -- al. (11) 
E#q:g'™ (E)=E 


Since g(x) = 2f(§), we have that Ag,(¢) = Ay,p(2q) and on the right 
hand side of (11) that f(™(€) = ὃ if and only if g™ (2€) = 2€. Therefore 


. dl 
λφ,ρ(ᾳ) = log® [24] = log |2|+ Jim =; >| ἰορίξ — al, 
ξγέᾳ: f™ (E)=E 
which is again analogous to Equation (10) in Example 4.1. 


Example 4,2 works because the Tchebycheff polynomial can be con- 
jugated to a polynomial of good reduction; a similar approach can be 
adopted for any polynomial that is conjugate to one of good reduction. 


177 


REFERENCES 
1. A. Beardon, Iteration of Rational Functions, Springer, New York, 1991. 
2. R. Benedetto, Reduction, dynamics, and Julia sets of rational functions, J. 


Number Theory, to appear. 

3. G.S. Call and S.W. Goldstine, Canonical heights on projective space, J. Number 
Theory 63 (1997), 211-243. 

4. G.S. Call and J.H. Silverman, Canonical heights on varieties with morphisms, 
Compositio Math. 89 (1993), 163-205. 

5. P. D’Ambros, G. Everest, R. Miles, and T. Ward, Dynamical systems arising 
from elliptic curves, Colloq. Math. (to appear) (2000). 

6. M. Einsiedler, G. Everest, and T. Ward, Entropy and the canonical height, 
Preprint. 

7. G. Everest and T. Ward, A dynamical interpretation of the global canonical 
height on an elliptic curve, Experiment. Math. 7 (1998), 305-316. 

8. N. Koblitz, p-adic analysis: a short course on recent work, LMS Lecture Notes 
46, Cambridge Univ. Press, 1980. | 

9. A. Freire, A. Lopes, and R. Mané, An invariant measure for rational maps, 
Bol. Soc. Brasil. Mat. 14 (1983), no. 1, 45-62. 

10. D.A. Lind and T. Ward, Automorphisms of solenoids and p-adic entropy, Er- 
godic Theory Dynam. Systems 8 (1988), 411-419. 

11. M.Y. Lyubich, Entropy of analytic endomorphisms of the Riemann sphere, 
Funktsional. Anal. i Prilozhen. 15 (1981), 83-84. 

12. P. Morton and J.H. Silverman, Periodic points, multiplicities, and dynamical 
units, J. Reine Angew. Math. 461 (1995), 81-122. 

13. W. Rudin, Real and Complex Analysis, McGraw-Hill, New York, 1974. 

14. L.G. Shirel’man, On functions in normed algebraically closed division rings, 
Izv. Akad. Nauk. SSSR Ser. Mat. 2 (1938), 487-498 (Russian). 


(M.E.) MATHEMATICAL INSTITUTE, UNIVERSITY OF VIENNA, STRUDLHOF- 
GASSE 4, A-1090 WIEN, AUSTRIA. 
E-mail address: manfred@mat.univie.ac.at 


(G.E. & T.W.) SCHOOL OF MATHEMATICS, UNIVERSITY OF EAST ANGLIA, 
NORWICH NR4 7TJ, UK. 

E-mail address: g.everest@uea.ac.uk 

E-mail address: t.wardQuea.ac.uk 


THE ELEMENTARY PROOF OF THE PRIME NUMBER THEOREM: 
AN HISTORICAL PERSPECTIVE 


(by D. Goldfeld) 


The study of the distribution of prime numbers has fascinated mathematicians since 
antiquity. It is only in modern times, however, that a precise asymptotic law for the number 
of primes in arbitrarily long intervals has been obtained. For a real number zx > 1, let 7(zx) 
denote the number of primes less than x. The prime number theorem is the assertion that 


Jim: τ) | ae 


This theorem was conjectured independently by Legendre and Gauss. 


The approximation 
x 


ce Ξ Alog(z) +B 


was formulated by Legendre in 1798 [1,61] and made more precise in [1.62] where he 
provided the values A = 1, B = —1.08366. On August 4, 1823 (see [Lal], page 6) Abel, 
in a letter to Holmboe, characterizes the prime number theorem (referring to Legendre) as 
perhaps the most remarkable theorem in all mathematics. 

Gauss, in his well known letter to the astronomer Encke, (see [La1], page 37) written 
on Christmas eve 1849 remarks that his attention to the problem of finding an asymptotic 
formula for m(x) dates back to 1792 or 1793 (when he was fifteen or sixteen), and at 
that time noticed that the density of primes in a chiliad (ie. [x,x2 + 1000]) decreased 
approximately as 1/log(x) leading to the approximation 


dt 
log(t) | 


The remarkable part is the continuation of this letter, in which he said (referring to 
Legendre’s eee) approximation and Legendre’s value A(x) = 1.08366) that whether 
the quantity A(z) tends to 1 or to a limit close to 1, he does not dare conjecture. 

The first paper in which something was proved at all regarding the asymptotic dis- 
tribution of primes was Tchebychef’s first memoir ([Tch1]) which was read before the 
Imperial Academy of St. Petersburg in 1848. In that paper Tchebychef proved that if any 
approximation to 7(x) held to order z/log(x)" (with some fixed large positive integer Ν) 
then that approximation had to be Li(x). It followed from this that Legendre’s conjecture 
that jim, A(x) = 1.08366 was false, and that if the limit existed it had to be 1. 


The first person to show that a(x) has the order of magnitude Toe) was Tchebychef 


in 1852 [Tch2]. His argument was entirely elementary and made use of properties of 
factorials. It is easy to see that the highest power of a prime p which divides zr! (we 
assume ὦ is an integer) is simply | 


πὸ [ 


179 


\ 


D. Chudnovsky et al. (eds.), Number Theory 


© Springer-Verlag New York, Inc. 2004 
OU OU 


180 


where [t] denotes the greatest. integer less than or equal to ¢. It immediately follows that 


αἱ ΞΞ- Il ple/Pl+[x/p7}+~ 
ΡΞ. 


ἑν Σ (E+ [3] +L] Jen 


psx 


and 


Now log(a!) is asymptotic to 2log(z) by Stirling’s asymptotic formula. and, since squares. 
cubes, ... of primes are comparatively rare, and [1 }}] is almost the same as x/p, one may 
easily infer that 


x) ve = zlog(r) + O(z) 


psx 


from which one can deduce that π(1) is of order loe(zy’ J his was essentially the method of 
T’chebychef, who actually proved that [Tch2] 


x 6B 
B ‘ a ---- 
< τα) | ξς < τ 


for all sufficiently large numbers x, where 


and 


Unfortunately, however, he was unable to prove the prime number theorem itself this way, 
and the question reinained as to whether an elementary proof of the prime number theorem 
could be found. 

Over the years there were various improvements on Tchebychef’s bound, and in 1892 
Sylvester [Syl1], [5.12] was able to show that 


0.956 < τα) | aos < 1.045 


for all sufficiently large z. We quote from Harold Diamond’s excellent survey article [D]: 


The approach of Sylvester was ad hoc and computationally complex; it offered 
no hope of leading to a proof of the P.N.T. Indeed, Sylvester concluded in his 
article with the lament that “...we shall probably have to wait [for a proof of 
the P.N.T. ] until someone is born into the world so far surpassing Tchebychef 
in insight and penetration as Tchebychef has proved himself superior in these 
qualities to the ordinary run of mankind.” | 


181 


The first proof of the prime number theorem was given by Hadamard [H1], [H2] 
and de la Vallée Poussin [VP] in 1896. The proof was not elementary and made use of 
Hadamard’s theory of integral functions applied to the Riemann zeta function ¢(s) which 
is defined by the absolutely convergent series 


ζ(5) = Son, 
n=l 


for Re(s) > 1. A second component of the proof was a simple trigonometric identity 
(actually, Hadamard used the doubling formula for the cosine function) applied in an 
extremely clever manner to show that the zeta function didn’t vanish on the line Re(s) = 1. 
Later, several simplified proofs were given, in particular by Landau [L] and Wiener [W1], 
[W 2], which avoided the Hadamard theory. 

In 1921 Hardy (see [B]) delivered a lecture to the Mathematical Society of Copen- 
hagen. He asked: 


“No elementary proof of the prime number theorem is known, and one may 
ask whether it is reasonable to expect one. Now we know that the theorem 
is roughly equivalent to a theorem about an analytic function, the theorem 
that Riemann’s zeta function has no roots on a certain line. A proof of such 
a theorem, not fundamentally dependent on the theory of functions, seems to 
me extraordinarily unlikely. It is rash to assert that a mathematical theorem 
cannot be proved in a particular way; but one thing seems quite clear. We have 
certain views about the logic of the theory; we think that some theorems, as we 
say ‘lie deep’ and others nearer to the surface. If anyone produces an elementary 
proof of the prime number theorem, he will show that these views are wrong, 
that the subject does not hang together in the way we have supposed, and that 
it is time for the books to be cast. aside and for the theory to be rewritten.” 


In the year 1948 the mathematical world was stunned when Paul Erdés announced 
that he and Atle Selberg had found a truly elementary proof of the prime number theorem 
which used only the simplest properties of the logarithm function. Unfortunately, this 
announcement and subsequent events led to a bitter dispute between these two mathe- 
maticians. The actual details of what transpired in 1948 have become distorted over time. 
A short paper, “The elementary proof of the prime number theorem,” by E.G. Straus has 
been circulating for many years and has been the basis for numerous assertions over what 
actually happened. In 1987 I wrote a letter to the editors of the Atlantic Monthly (which 
was published) in response to an article about Erdés [Ho] which discussed the history of 
the elemnentary proof of the prime number theorem. At that time Selberg sent me his file 
of documents and letters (this is now part of [G]). Having been a close and personal friend 
of Erdos and also Selberg, having heard both sides of the story, and finally having a large 
collection of letters and documents in hand, I felt the time had come to simply present the 
facts of the matter with supporting documentation. 

Let me begin by noting that in 1949, with regard to Paul Erddés’s paper, “On a 
new method in elementary number theory which leads to an elementary proof of the prime 


182 


nuunber theorem,” the Bulletin of the American Mathematical Society informed Erdo6s that 
the referee does not recommend the paper for publication. Erdos immediately withdrew 
the paper and had it published in the Proceedings of the National Academy of Sciences 
[E] . At the same time Atle Selberg published his paper, “An elementary proof of the 
prime-number theorem,” in the Annals of Mathematics [5]. These papers were brilliantly 
reviewed by A.E. Ingham [I]. 

The elementary proof of the prime number theorem was quite a sensation at the 
time. For his work on the elementary proof of the PNT, the zeros of the Riemann zeta 
function (showing that a positive proportion lie on the line 3), and the development of the 
Selberg sieve method, Selberg received the Fields Medal [Β] in 1950. Erdés received the 
Cole Prize in 1952 [C]. The Selberg sieve method, a cornerstone in elementary number 
theory, is the basis for Chen’s [Ch] spectacular proof that every positive even integer 
is the sum of a prime and a number having at most two prime factors. Selberg is now 
recognized as one of the leading mathematicians of this century for his introduction of 
spectral theory into number theory culminating in his discovery of the trace formula [A- 
B-G] which classifies all arithmetic zeta functions. Erdés has also left an indelible mark on 
inathematics. His work provided the foundations for graph and hypergraph theory [C—G] 
and the probabilistic method [A~S] with applications in combinatorics and elementary 
number theory. At his death in 1996 he had more than 1500 published papers with many 
coauthored papers yet to appear. It is clear that he has founded a unique school of 
inathematical research, international in scope, and highly visible to the world at large. 


Acknowledgment: The author would like to thank Enrico Bombieri, Melvyn Nathanson, 
and Atle Selberg for many clarifying discussions on historical detail. In addition I received 
a wide variety of helpful comments from Michael Anshel, Harold Diamond, Ron Graham, 
Dennis Hejhal, Jeff Lagarias, Attila Mate, Janos Pach, and Carl Pomerance. 


March 1948: Let ¥(r) = Σ᾽ log(p) denote the sum over primes p < x. The prime number 
pox 
theorem is equivalent to the assertion that 


lim 94) = 


LOO ΝΜ 


1. 


In March 1948 Selberg proved the asymptotic formula 


0(x) log(a) + δ᾽ Ιορ(ρὴϑ (=) = 2x log(x) + O(z). 


psx 


He called this the fundamental formula. 
We quote from Erdés’s paper, “On a new method in elementary number theory which 
leads to an elementary proof of the prime number theorem,” Proc. Nat. Acad. Scis. 1949: 


“Selberg proved some months ago the above asymptotic formula, ... the 
ingenious proof is completely elementary ... Thus it can be used as a 
starting point for elementary proofs of various theorems which previously 
seemed inaccessible by elementary methods.” 


183 


Quote from Selberg: Letter to H. Weyl Sept. 16, 1948 


“I found the fundamental formula ... in March this year ... I had found a 
more complicated formula with similar properties still earlier.” 


April 1948: Recall that (5) = 5° log(p). Define 


px 
ϑ 
a= liming 2) A = limsup ΠΩΣ 
x 


Sylvester’s estimates guarantee that 
0.956 <a <A< 1.045. 
In his letter to H. Weyl, Sept. 16, 1948, Selberg writes: 
“I got rather early the result that a+ A = 2,” 
The proof that a + A = 2 is given as follows. Choose zx large so that 
U(r) = αὐ + o(z). 


Then since (1) < Ax + ο(α) it follows from Selberg’s fundamental formula that 


ax log(a) + δ᾽ ΑΞ log(p) > 2x log(x) + o(z log(x)). 
Pp 
ΡΞ. 
Using Tchebychef’s result that 
] 

»" ἰορίρ) Ἰορ(α) 

psx P 
it is immediate that a + A > 2. On the other hand, we can choose «x large so that 

U(x) = Ax+o(z). 


Then since 0(x) > ax + o(x) it immediately follows as before that 


Ax log(x) + »" a~ log(p) < 2x log(x) + o(z log(z)), 
px 


from which we get a+ A < 2. Thus 
atA= 2. 
Remark: Selberg was aware of the fact (already in April 1948) that a + A = 2, and that 


the prime number theorem would immediately follow if one could prove either a = 1 or 


A=1. 


184 


May-July 1948: We again quote from Selberg’s letter to H. Weyl of Sept. 16, 1948. 


“In May I wrote down a sketch to the paper on Dirichlet’s theorem, during 
June I did nothing except preparations to the trip to Canada. ‘Then around 
the beginning of July, Turan asked me if I could give him my notes on 
the Dirichlet theorem so he could see it, he was going away soon, and 
probably would have left when I returned from Canada. I not only agreed 
to do this, but as I felt very much attached to Turan I spent some days 
going through the proof with him. In this connection I mentioned the 
fundamental formula to him, . . . However, I did not tell him the proof 
of the formula, nor about the consequences it might have and my ideas in 
this connection... I then left for'Canada and returned after 9 days just as 
Turan was leaving. It turned out that Turan had given a seminar on my 
proof of the Dirichlet theorem where Erd6és, Chowla, and Straus had been 
present, I had of course no objection to this, since it concerned something 
that was already finished from my side, though it was not published. In 
connection with this Turan had also mentioned, at least to Erd6s, the 
fundamental formula, this I don’t object to either, since I had not asked 
him not to tell this further.” 


July 1948: Quote from E.G. Straus’ paper, “The elementary proof of the prime number 
theorem.” 


“Turan who was eager to catch up with the mathematical developments 
that had happened during the war, talked with Selberg about his sieve 
method and now famous inequality (Fundamental Formula). He tried to 
talk Selberg into giving a seminar ... Selberg suggested Turan give the 
seminar. 


This Turan did for a small group of us, including Chowla, Erdés and 
myself, ... After the lecture .... there followed a brief discussion of the 
unexpected power of Selberg’s inequality.” 


“Erdos said, 


I think you can also derive 


. Pn+1 
lim =] 
τι -- OO Pn 


from this inequality. 


In any case within an hour or two Erdés had discovered an ingenious 
derivation from Selberg’s inequality. After presenting an outline of the 
proof to the Turan Seminar group, Erd6s met Selberg in the hall and told 
him he could derive at — 1 from Selberg’s inequality.” 


n 


“Selberg responded something like this: 


185 


You must have made a mistake because with this result I can get an ele- 
mentary proof of the prime number theorem, and I have convinced myself 
that my inequality is not powerful enough for that.” 


Quote from Weyl’s letter to Selberg August 31, 1948 


“Is it not true that you were in possession of what Erdos calls the fun- 
damental inequality and of the equation a + A = 2 for several months 
but could not prove a = A = 1 until Erdés deduced Pott — 1 from your 
inequality?” 

Here is Selberg’s response in his letter to Weyl, Sept. 16, 1948. 


“Turan had mentioned to Erdds after my return from Montreal he told me 
he was trying to prove ? oe — 1 from my formula. 


n 


Actually, I didn’t like that somebody else started working on my unpub- 
lished results before I considered myself through with them.” 


“But though I felt rather unhappy about the situation, I didn’t say any- 
thing since after all Erdés was trying to do something different from what 


I was interested in. 
In spite of this, I became ... rather concerned that Erdés was working on 


these things . . 
I, therefore, started very feverishly to work on my own ideas. On Friday 


evening Erdos had his proof ready (that pete — 1) and he told it to me. 


On Sunday afternoon I got my first proof of the prime number theorem. 
I was rather unsatisfied with the first proof because it was long and indi- 
rect. After a few days (my wife says two) I succeeded in giving a different 


proof.” 
Quote from Erdés’s paper, “On a new method in elementary number theory which leads 
to an elementary proof of the prime number theorem,” Proc. Nat. Acad. Scis. 1949: 


“ Using (1) (fundamental formula) I proved that Patt —>lasn—- oo. In 
fact, I proved the following slightly stronger result: To every ε there exists 
a positive δ(ε) so that for x sufficiently large we have 


m(a(1+€)) — r(x) > 5(e)x/log(a) 


where 7(z) is the number of primes not exceeding 2. 
I communicated this proof to Selberg, who, two days later . . . deduced 
the prime number theorem.” 


Recently, Selberg sent me a letter which more precisely specifies the actual dates of events. 


Quote from Selberg’s letter to D. Goldfeld, January 6, 1998: 


“July 14, 1948 was a Wednesday, and on Thursday, July 15 I met Erdos 
and heard that he was trying to prove 5 Ν — 1. I believe Turan left the 


τι 


186 


next day (Friday, July 16), at any rate whatever lecture he had given (and 
I had not asked him to give one!) he had given before my return, and he 
was not present nor played any part in later events. Friday evening or it 
may have been Saturday morning, Erdés had his proof ready and told me 
about it. Sunday afternoon (July 18) I used his result (which was stronger 
than just nett — 1, he had proved that between xz and z(1 - δ) there are 
more than c(6)is7) primes for x > xo(4), the weaker result would not 


have been sufficient for me) to get my first proof of the PNT. I told Erdos 
about it the next morning (Monday, July 19). He then suggested that we 
should talk about it that evening in the seminar room in Fuld Hall (as 
I thought, to a small informal group of Chowla, Straus and a few others 
who might be interested).” 


In the same letter Selberg goes on to dispute Straus’ recollection of the events. 


“Turan’s lecture (probably a quite informal thing considering the small 
group) could not have been later than July 14, since it was before my 
return. Straus has speeded up events; Erdos told me he was trying to 
prove ἘΞΕΞ —+ 1 on July 15. He told me he had a proof only late on July 
16 or possibly earlier the next day. Straus’ quote is also clearly wrong for 
the following reasons; first, I needed more than just 7 a — 1 for my first 
proof of the PNT, second, I only saw how to do it on “Sunday, July 18. 
It is true, however, as Erdos’ and Straus’ stories indicate, that when I first 
was told by Erdés that he was trying to prove “++ — 1 from my formula, 
I tried to discourage him, by saying that I doubted whether the formula 
alone implied these things. I also said I had constructed a counterexample 
showing that the relation in the form 


f(x) log x + [ f (=) df(t) = 2x log x + (O(2z)) 


by itself does not imply that f(z) ~ x. It was true, I did have such an 
example. What I neglected to tell that in this example f(z) (though posi- 
tive and tending to infinity with x) was not monotonic! This conversation 
took place either in the corridor of Fuld Hall or just outside Fuld Hall so 
without access to a blackboard. This attempt to throw Erdés off the track 
(clearly not succeeding!) is somewhat understandable given my mood at 
the time. 


Quote from Selberg’s Paper, “An elementary proof of the prime-number theorem,” Annals 
Math. 1949 


“From the Fundamental Inequality there are several ways to deduce the 
prime number theorem ... The original proof made use of the following 
result of Erdés 9#++ — 1. Erdés’s result was obtained entirely independent 
of my work.” " 


187 


Selberg’s first proof that the prime number theorem followed from the fundamental 
formula is given both in [E] and [5]. The crux of the matter goes something like this. We 
may write the fundamental formula in the forin 


B(x) U(x/p) log(p) _ 1 
ἭΝ Σ t/p plog(x) ὁ ΓΝ 


psx 


Recall that a and A are the limit inferior and limit superior, respectively, of Viz) 
Now, choose x large so that 9.5) is near A. Since a+ A = 2, it follows from the 
fundamental formula and 
S- log(p) ~] 
= p log(2) 


that “eye must be near a for most primes p < x. If S denotes the set of exceptional 


log(p) log(p) 
»" ᾿ δ᾽ : 0. 


ρξα psx 
pes 


primes, then we have 


Now, choose a small prime ᾳ ¢ ὁ such that era is near a. Rewriting the fundamental 
formula with z replaced by x/q, the same argument as above leads one to conclude that 
were) is near A for most primes p < x/q. It follows that J(2/p) ~ az/p for most primes 


p “1 and that YV(xr/pq) τὸ Ax/pq for most p < x/g. A contradiction is obtained (using 
Erdés’s idea of nonoverlapping intervals) unless a = A = 1. 


The Erdés-Selberg dispute arose over the question of whether a joint paper (on 
the entire proof) or seperate papers (on each individual contribution) should 
appear on the elementary proof of the PNT. 


August 20, 1948: Quote from a letter of Selberg to Erdos. 
“What I propose is the only fair thing: each of us can publish what he has 
actually done and get the credit for that, and not for what the other has 


done. 


You proved that 
lim ἘΞ = 1, 


N— CO Pn 
I would never have dreamed of forcing you to write a joint paper on this 
in spite of the fact that the essential thing in the proof of the result was 
mine.” 
“Since there can be no reason for a joint paper, I am going to publish my 
proof as it now is. I have the opinion, . . . that I do you full justice by 
telling in the paper that my original proof depended on your result. 


188 


In addition to this I offered you to withhold my proof so your theorem 
could be published earlier (of course then without mentioning PNT). 


I still offer you this. . . 
If you don’t accept this I publish my proof anyway.” 


Sept. 16, 1948: Quote from Selberg’s letter to Weyl. 


“when I came to Syracuse I discovered gradually through various sources 
that there had been made quite a publicity around the proof of the PNT. I 
have myself actually mentioned it only in one letter to one of my brothers 


Almost all the people whom the news had reached seemed to attribute the 
proof entirely or at least essentially to Erd6s, this was even the case with 
people who knew my name and previous work quite well.” 


Quote from E.G. Straus’ , “The elementary proof of the prime number theorem.” 


“In fact I was told this story (I forget by whom) which may well not be 
true . . . When Selberg arrived in Syracuse he was met by a faculty 
member with the greeting: ” 


Have you heard the exciting news of what Erdés and some Scandinavian 
mathematician have just done? ” 


Quote from Selberg’s letter to D. Goldfeld, January 6, 1998: 


“This is not true. What 1 did hear shortly after my arrival were some 
reports (originating from the Boston-Cambridge area) where only Erdés 
was mentioned. Later there were more such reports from abroad.” 


Sept. 20, 1948: We quote from a second letter of Selberg to Erddés. 


“T hope also that we will get some kind of agreement. But I cannot accept 
any agreement with a joint paper. 


How about the following thing. You publish your result, I publish my 
newest proof, but with a satisfactory sketch of the ideas of the first proof 
in the introduction, and referring to your result. I could make a thorough 
sketch on 2 pages, I think, and this would not make the paper much longer. 
If you like, I could send you a sketch of the introduction. 


I have thought to send my paper to the Annals of Math., they will certainly 
agree to take your paper earlier.” 


Sept. 27, 1948: Quote from Erdés’s letter to Selberg. 


“I have to state that when I started to work on 72+! — 1 you were very 
doubtful about success, in fact stated that you believe to be able to show 


189 


that the FUND. LEMMA does not imply the PNT (prime number theo- 
rem). 

If you would have told me about what you know about a and A, I would 
have finished the proof of PNT on the spot. 

Does it occur to you that if I would have kept the proof of eet — 1 to 
myself (as you did with a + A = 2) and continued to work on PNT... I 
would soon have succeeded and then your share of PNT would have only 
been the beautiful FUNDAMENTAL LEMMA. 


Sept. 27, 1948: Quote from Erdés’s letter to Selberg. 
“T completely reject the idea of publishing only 


lim Poti 1. 

n—-CO Dn 
and feel just as strongly as before that I am fully entitled to a joint paper. 
So if you insist on publishing your new proof all I can do is to publish our 
simplified proof, giving you of course full credit for your share (stating that 
you first obtained the PNT, using some of my ideas and my theorem). 


Also, I will of course gladly submit the paper to Wey] first, if he is willing 
to take the trouble of seeing that I am scrupulously fair to you. 


Quote from E.G. Straus: “The elementary proof of the prime number theorem.” 


“It was Weyl who caused the Annals to reject Erdés’s paper and pub- 
lished only a version by Selberg which circumvented Erdés’s contribution, 
without mentioning the vital part played by Erdos in the first elementary 
proof, or even the discovery of the fact that such a proof was possible.” 


Quote from Selberg’s letter to D. Goldfeld, January 6, 1998: 


“This is wrong on several points, my paper mentioned and sketched in 
some detail how Erdés’s result played a part and was used in the first 
elementary proof of PNT, but that first proof was mine as surely as Erdos’ 
result was his. Also the discovery that such a proof was possible was surely 
mine. After all, you don’t know that it is possible to prove something until 
you have done so!” 


Excerpt: Handwritten Note by Erdos: 


“It was agreed that Selberg’s proof should be in the Annals of Math., 
mine in the Bulletin. Weyl was supposed to be the referee. To my great 
surprise Jacobson the referee. . . The Bulletin wrote that the referee does 
not recommend my paper for publication. 


I immediately withdrew the paper and planned to publish it in the JLMS 
but... had it published in the Proc. Nat. Acad.” 


190 


Feb. 15, 1949: Quote from H. Weyl’s letter to Jacobson 


“T had questioned whether Erdds has the right to publish things which 
are admittedly Selberg’s. . . I really think that Erd6s’s behavior is quite 
unreasonable, and if I were the responsible editor I think I would not be 
afraid of rejecting his paper in this form. 


But there is another aspect of the matter. It is probably not as easy as 
Erdos imagines to have his paper published in time in this country if the 
Bulletin rejects it. . . So it may be better to let Erdés have his way. No 
great harm can be done by that. Selberg may feel offended and protest 
(and that would be his right), but I am quite sure that the two papers -- 
Selberg’s and Erd6s’s together — will speak in unmistakable language, and 
that the one who has really done harm to himself will be Erdés.” 


Quote from E.G. Straus: “The elementary proof of the prime number theorem.” 


“The elementary proof has so far not produced the exciting innovations in 
number theory that many of us expected to follow. So, what we witnessed 
in 1948, may in the course of time prove to have been a brilliant but 
somewhat incidental achievement without the historic significance it then 
appeared to have.” 


Quote from Selberg’s letter to D. Goldfeld, January 6, 1998: 


“With this last quote from Straus, I am in agreement (actually I did not 
myself expect any revolution from this). The idea of the local sieve, how- 
ever, has produced many things that have not been done by other meth- 
ods.” 


Remark: To this date, there have been no results obtained from the elementary proof of 
the PNT that cannot be obtained in stronger form by other methods. Other elementary 
methods introduced by both Selberg and Erdés have, however, led to many important 
results in number theory not attainable by any other technique. 


Dec. 4, 1997: Letter from Selberg to D. Goldfeld. 


“The material I have is nearly all from Herman Weyl’s files, and was given 
to me probably in 1952 or 1953 as he was cleaning out much of his stuff 
in Princeton, taking some to Zurich and probably discarding some. The 
letters from Weyl to myself was all that I kept when I left Syracuse in 
1949, all the rest I discarded. Thus there are gaps. Missing is my first 
letter to Erdés as well as his reply to it. . . 


I did not save anything except letters from Weyl because I was rather 
disgusted with the whole thing. I never lectured on the elementary proof 
of the PNT after the lecture in Syracuse, mentioned in the first letter to 
Herman Weyl. However, I did at Cornell U. early in 1949 and later at 


191 


an AMS meeting in Baltimore gave a lecture with an elementary proof 
of (using the notation of Beurling generalized primes & integers) the fact 
that if 


N(x) = Az +0 (25) 


then 


πᾷ) = pars +0( 5}. 


log(z) log(z) 
Beurling has the same conclusion if 
N(x) = Ar +O Ceo. 
(log(x))° 
with a > Σ᾽ I never published this. 


Erdos of course lectured extensively in Amsterdam, Paris, and other places 
in Europe. After his lecture in Amsterdam, Oct. 30, 1948, v.d. Corput 
wrote up a paper, Scriptum 1, Mathematisch Centrum, which was the first 
published version!” 


References 
[A-B-G] Κα. Aubert, E. Bombieri, D. Goldfeld, Number theory, trace formulas and 
discrete groups, symposium in honor of Atle Selberg, Academic Press Inc. Boston (1989). 


[A—S] N. Alon, J. Spencer, The probabilistic method, John Wiley & Sons Inc., New York 
(1992). 

[B] H. Bohr, Address of Professor Harold Bohr, Proc. Internat. Congr. Math. (Cam- 
bridge, 1950) vol 1, Amer. Math. Soc., Providence, R.I., 1952, 127-134. 


[Ch] J. Chen, On the representation of a large even integer as the sum of a prime and the 
product of at most two primes, Sci. Sinica 16 (1973), 157-176. 


[(C—G] F. Chung, R. Graham, Frdés on graphs: his legacy of unsolved problems, A.K. 
Peters, Ltd., Wellesley, Massachusetts (1998). 


[C] L.W. Cohen, The annual meeting of the society, Bull. Amer. Math. Soc 58 (1952), 
159-160. 


[D] H.G. Diamond, Elementary methods in the study of the distribution of prime numbers, 
Bull. Amer. Math. Soc. vol. 7 number 3 (1982), 553-589. 


[E] P. Erdés, On a new method in elementary number theory which leads to an elementary 
proof of the prime number theorem, Proc. Nat. Acad. Scis. U.S.A. 35 (1949), 374-384. 


[6] D. Goldfeld, The Erdés—Selberg dispute: file of letters and documents, to appear. 


[Η1] J. Hadamard, Etude sur les proprietés des fonctions entiéres et en particulier d’une 
fonction considérée par Riemann, J. de Math. Pures Appl. (4) 9 (1893), 171-215; reprinted 
in Oeuvres de Jacques Hadamard, C.N.R.S., Paris, 1968, vol 1, 103-147. 


192 


[Η2] J. Hadamard, Sur la distribution des zéros de la fonction ζ(8) et ses conséquences 
arithmétiques, Bull. Soc. Math. France 24 (1896), 199-220; reprinted in Oeuvres de 
Jacques Hadamard, C.N.R.S., Paris, 1968, vol 1, 189-210. 


[Ho] P. Hoffman, The man who loves only numbers, The Atlantic, November (1987). 


[I] A.E. Ingham, Review of the two papers: An elementary proof of the prime—number 
theorem, by A. Selberg and On a new method in elementary number theory which leads 
to an elementary proof of the prime number theorem, by P. Erdos. Reviews in Number 
Theory as printed in Mathematical Reviews 1940-1972, Amer. Math. Soc. Providence, RI 
(1974). See N20-3, Vol. 4, 191-193. 


[1,41] E. Landau, Handbuch der Lehre von der Verteilung der Primzahlen, Teubner, Leipzig 
(1909), 2 volumes, reprinted by Chelsea Publishing Company, New York (1953). 


{La2] E. Landau, Uber den Wienerschen neuen Weg zum Primzahlsatz, Sitzber. Preuss. 
Akad. Wiss., 1932, 514-521. 


[Le1] A.M. Legendre, Essai sur la théorie des nombres, 1. Aufl. Paris (Duprat) (1798). 
[Le2] A.M. Legendre, Essai sur la théorie des nombres, 2. Aufl. Paris (Courcier) (1808). 


[S] A. Selberg, An elementary proof of the prime-number theorem, Ann. of Math. (2) 
50 (1949), 305-313; reprinted in Atle Selberg Collected Papers, Springer-Verlag, Berlin 
Heidelberg New York, 1989, vol 1, 379-387. 


[Syl1] J.J. Sylvester, On Tchebycheff’s theorem of the totality of prime numbers comprised 
within given limits, Amer. J. Math. 4 (1881), 230-247. | 


[Sy12] J.J. Sylvester, On arithmetical series, Messenger of Math. (2) 21 (1892), 1-19 and 
87-120. | 


[Tch1] P.L. Tchebychef, Sur la fonction qui détermine la totalité des nombres premiers 
inférieurs ἃ une limite donnée, Mémoires présentés ἃ |’ Académie Impériale des Sciences de 
St.-Pétersbourg par divers Savants et lus dans ses Assemblées, Bd. 6, S. (1851), 141-157. 


[Tch2] P.L. Tchebychef, Mémoire sur les nombres premiers, J. de Math. Pures Appl. (1) 
17 (1852), 366-390; reprinted in Oeuvres 1 (1899), 49-70. 


[VP] C.J. de la Vallée Poussin, Recherches analytiques sur la théorie des nombres premiers, 
Ann. Soc. Sci. Bruxelles 20 (1896), 183-256. 


[W1] N. Wiener, A new method in Tauberian theorems, J. Math. Physics M.I.T. 7 (1927- 
28), 161-184. 


[W2] N. Wiener, Tauberian theorems, Ann. of Math. (2) 33 (1932), 1-100. 


ADDITIVE BASES REPRESENTATIONS 
AND THE ERDOS-TURAN CONJECTURE 


G. GREKOS, L. HADDAD, C. HELOu*® , J. PIHKO 


ABSTRACT. We give a lower bound to the maximal number of representations by an additive 
basis of the natural numbers, in conjunction with a celebrated conjecture of Erdés and Turan. 


INTRODUCTION 


Denote by N = {0,1,2,...} the set of natural numbers and consider a subset A of N. 
The number r(A,n) of representations of an element n of N by A is the number of ordered 
pairs (a,b) € A x A such that a+b =n. We will say that A is a basis of N if r(A,n) > 1 
for all n € N. Our main objective is the exploration of the following conjecture of P. Erd6és 
and P. Turan [3], dating back to 1941. 


(ET): If A is a basis of N, then r(A,n) is unbounded, as n ranges through N. 


There seems to be relatively little work concerned with this conjecture. Originally, Erdés 
and Turan used some deep function theory to show that r(A,n) does not become constant 
for large n. But, in 1951, G. Dirac [1] proved this result by an elementary argument 
concerning the parity of r(A,n). In 1956, P. Erdés and W. Fuchs [4] established, using 
Fourier series, that it is impossible to have )>;_)1r(A,k) = cn + 0(n'/4(logn)—1/?). They 
also showed that if A = {a, < a2 <--- <a, <...} satisfies the condition a, < Kn? for 
some constant K > 0 and all n € N (which is true of every basis of N), then for any c > 0, 
one has limsup,,_,o6 + )-p-o(T(A, k) — c)” > 0. They further asserted the existence of a 
sequence A satisfying the same condition, for which limsup,_,,, + \p-9 7 (A, k)? < ov. 
In 1990, I. Ruzsa [8] confirmed this assertion, by constructing a basis A of N for which 
peo T(A,k)*? = O(n). In 1988, M. Dowd [2] gave a finite form of (ET) in N, equivalent 
to one of our formulations. His proof, using graph theory, was recently clarified and 
generalized by M. Nathanson [7]. Dowd also indicated that the validity of (ET) in Z 
implies its validity in N. But, in 2002, Nathanson [6] showed that (ET) is not valid in Z, 
by constructing arbitrarily sparse bases A of Z for which r(A, 7) is at most 2 for all n € Z. 

Here, we will introduce some functions, defined in terms of a variable bound zx in N 
and of the traces of all bases A of N on the interval (0, x], involving the values of r(A, 7). 
This allows for equivalent formulations of the Erdés-Turan conjecture susceptible of partial 


2000 Mathematics Subject Classification. 11B13. 
*Presenter 


Typeset by A,sS-TEX 


193 
D. Chudnovsky et al. (eds.), Number Theory 


© Springer-Verlag New York, Inc. 2004 
OU CO 


194 


quantitative verifications. In particular, we deduce that for any basis A of N, the numbers 
r(A,n) must at least take values > 6. For more ample details, we refer to [5]. 


§1 SOME FUNCTIONS DESCRIBING FINITE TRACES OF BASES 


For x € N and P CN, we write Piz] = PN [0,2] = {p € P: p < x}, and we set 
ρ(Ρ, 4) = max{r(P,n) : n € N[z]}. We also set s(P) = sup{r(P,n) : n € N}; this is an 
element of N = NU {oo}. 

The following properties are immediate but useful: If y € N is such that x < y, then 
p(P,x) < p(P,y); and if A Cc P, then p(A,x) < p(P,x). Moreover, p(P,x) = p(P[z], 1). 
Also, s(P) = Jim | p(P, x). 

A set P will be called a basis of N[z] if P C N[z] C P+ P, where P+ P={p+q: 
(p,q) € P x P}; whereas P is a basis of N if P+ P=N. The set of all bases of N{z] will 
be denoted by B(x), and that of all bases of N by B(N). Naturally, P € B(N) if and only 
if P[x] € B(x) for every x EN. 

The functions p and τ, from N into ΝΥ, are defined by p(x) = min{p(P,z) : P € B(x)} 
and τί“) = min{s(P): P € B(z)}. 


Lemma 1. 


(1) The functions p and 7 are increasing. 
(2) For every x ΕΝ, we have p(x) < r(x) < p(2z). 


Proof. 

(1) Let x,y € N be such that z < y. For any Q € B(y), we have Q[x] € B(x) and 
therefore p(Q,y) > e(Q,z) = p(Q[z],rz) > p(x) and s(Q) > s(Q[x]) > r(x). Hence 
p(y) = min{p(Q, y) : Q € Bly)} = p(x) and r(y) = min{s(Q) : Q € Bly)} = τᾷ). 

(2) The inequality p(x) < r(x) follows from the obvious fact that p(P,x) < p(P,2xr) = 
s(P), for all P € B(x). Moreover, for any Q € B(2z), since Q[x] € B(x), we have r(x) < 
s(Q[z]) = p(Q|x], 24) < p(Q, 25). Hence r(x) < min{p(Q, 25) : Q € B(2x)} = p(2xr). O 


A third significant function is σ : Ν᾽ — N*, where N* = ΝᾺ {0}. It is defined by 
a(n) = min{s(P) : P € B(#n)}, where B(#n) = {P CN: |P| =n and P € B(max P)}. 
A finite, non-empty subset P of N such that P € B(max P) will simply be called a finite 
basis, so that B(#n) is the set of all finite bases having exactly n elements. Furthermore, 
we will call successor of a finite basis P every finite basis Q obtained by adjoining to P 
an element g > max P, so that Q = PU {q} and Q € B(q). Clearly, Q = PU {4} isa 
successor of P if and only if max P+1 <q <h, where h = min(N\ (P+ P)). It is also 
easy to see that B(#(n + 1)) consists exactly of the successors of the elements of B(#:n). 


Lemma 2. 


(1) The function σ is increasing. 
(2) For any n € N*, we have 1 <o(n) <n. 


Proof. 

(1) For any n Ε N* and any Q € B(#(n + 1)), there is some P € B(#n) such that 
@ is a successor of P, namely P = Q@ \ {maxQ}. Then s(Q) > s(P) > a(n). Hence 
a(n + 1) = min{s(Q) :Q € B(#(n + 1)} > o(n). 


195 


(2) For any finite subset P of N and any x Ε N, the number r(P,n), of ordered pairs 
(p,n — p) such that p € P, cannot exceed the cardinality |P| of P. Therefore s(P) < [6]. 
In particular, if P € B(#n), then s(P) < |P| = n. Hence o(n) < n. Moreover, by (1) 
above, a(n) >o(1)=1. O 


Lemma 3. For any x EN, if P € B(x), then |P| > (V8x4+9-—1)/2. 


Proof. Let n = |P| and P = {pi,...,Pn}, where py < --- < py. Since N[xz] Cc P+ P, we 
have |N[z]| =z+1< |P+ P|. Moreover, P+ P= {pjt+p;:1<i< 7 <n} and therefore 
IP+ P| < _ 7 = n(n +1)/2. Thus x+1 < n(n+1)/2, ie. n?+n—2(2£+4+1) > 0, which 
implies that n > (—1+ V8z + 9)/2 (the positive root of the quadratic equation). O 
Corollary. For any n € N*, we have r(n — 1) < a(n) < r(n(n 4+ 1)/2 — 1). 


Proof. For any P € B(#n), since P € B(max P), we have s(P) > r(max FP); and since 
P has n elements, we have max P > πὶ -- 1, so that, + being an increasing function, 
s(P) > 7(max P) > 7(n — 1). Hence o(n) = min{s(P) : P € B(#n)} > τίη -- 1). 

On the other hand, if c = n(n + 1)/2 — 1 then, by Lemma 3, for any P € B(x), we 
have |P| > n. Thus, o being an increasing function, s(P) > o(|P|) > a(n). Hence 
T(n(n + 1)/2-—1) = min{s(P): Pe B(x)} >o(n). O 


The problem considered here can be more generally stated as the determination of the 
element A = inf{s(P) : P € B(N)} of N. Indeed, the Erdés-Turdn conjecture amounts to 
the assertion that A = oo. 


Lemma 4. We have lim p(x) = lim r(x) = lim o(2) < A. 
L—0O xL—+ 00 Z— Oo 


Proof. Since the functions p, 7 and a are increasing, they all have limits in N, as x — co. 
The first equality then follows from Lemma 1, (2), and the second equality follows from 
the above Corollary. Moreover, for any z € N and any P € B(N), since P[z] € B(x), we 
have 7(x) < s(P[x]) < s(P). Hence r(x) < A, for all x € N, and thus Jim T(x) <A. O 


§2 EQUIVALENT FORMULATIONS OF THE ERDOS-TURAN CONJECTURE 
We will need the following important set-theoretic notions and results. 


Definition. By an infinite family of subsets of N, we will mean any family P = (Pi);e7 
of subsets of N whose index set I is infinite. Furthermore, a subset A of N will be called a 
diagonal of such a family P, if for anyn €N, there are infinitely many indices 1 € I such 
that P; has the same trace as A on the interval [0,n], 1.6. such that A[n] = P;[n}. 


For instance, if P = (P;),;-, is an infinite increasing, for inclusion, sequence of subsets 
of N (i.e. J C N and ἴοσ 1,1 € J, ift < 7 then αὶ C P;), then User P, is the only diagonal of 
P. But in general, it is not obvious whether an arbitrary infinite family P has a diagonal. 

As a matter of notation, for any set X, we denote by S(X) the set of all subsets of X. 


Lemma 5 (The Diagonal Lemma). Every infinite family P = (Pi),;<, of subsets of N 
has at least one diagonal ACN. 


Proof. We construct, by induction on n, an increasing sequence of subsets A, of N and 
a decreasing sequence of infinite subsets 1, of J such that for any n € N and any 7 € ἴῃ, 


196 


we have A, = P,[n]. Then we let A = Ucn An and we verify that A is a diagonal of the 
family Ὁ. 

For n = 0, let Fo : J —> S({0}) be the map defined by Fo(z) = P;[0], for allz € 1. Since 
I = Ες 1(0) U Fo *({0}) is an infinite set, then, one at least of the two sets @ or {0}, that 
we call Ao, has an infinite preimage [0 = Fy '(Ao). For n € N*, we assume constructed 
2n subsets ἄρα --: C An_-1 C N and J,_-; C--- C Igo C J, such that J,_1 is infinite and 
A; = P|] for allie I; andO <j <n—1. Let F, : In-1 —> S(N[n}) be the map defined 
by F,(t) = Pi[n], for all ὁ Ε J,-1. Since S(N[n]) is finite and In-1 = Uxesntny ἔν (X) 
is infinite, there exists an element A, € S(N[(n]) such that 1, = F~1(A,) is an infinite 
subset of I,_1. Since I, C In_1, then, for n € I,, we have A,_1 = Pi[n—-1] C Pi[n] = An. 
This completes the construction by induction. 

Now, letting A = U,en Ax, we have A[n] = ἰ ᾿μεν Axln], for all n € N. But since the 
sequence (4) is increasing, for 0 < k < n, we have Ax[n] C Α,, πη] = A,. Moreover, for 
k > n, since Ix C I,, for any i € I,, we have Ax[n] = (P,[k])[n] = P;[n] = An. Hence 
A[{n] = A, = P,{n], for all n in N and all ὁ in the infinite set J,. Thus A is a diagonal of 
the family Ὁ. O 


Corollary 1. Let P = (P;),<, be an infinite family of subsets of N, with I CN, and let 
A be a diagonal of this family. 
(1) If P; ε B(i) for allie I, then A ε B(N). 
(2) If s(P;) < s, for somes € N and alli € I, then s(A) <s. 
(3) If P; € B(t) and s(P;) = (i) for alli € I, then A € B(N) and we have s(A) = 
im. T(z) =A. 


Proof. (1) and (2). For every n € N, there is an infinite subset J, of J such that for all 
i € In, we have Aln] = P;[n| and therefore r(A,n) = r(P;,n). Thus if P; € B(t) (resp. 
s(P;) < 8) for all i € J, then, choosing i > πὶ in I,, we get r(A,n) = r(Pi,n) > 1 (resp. 
r(A,n) = r(P;,n) < s(P;) < 5). Since this holds for every n € N, we conclude that 
A € B(N) (resp. s(A) < 5). 

(3) If P; € B(i) and s(P;) = 7(2), for alli € J, then, by (1), A € B(N). Moreover, for any 
n EN, there is an infinite subset J, of J such that for all i € J,, we have A[n| = P,[n] and 
therefore r(A,n) = r(P;,n) < s(P;) = T(t) < Jim T(x), since 7 is an increasing function. 
It follows that s(A) < Jim, T(x). On the other hand, by Lemma 4 and the definition of A, 
we have Jim | T(x) < A < s(A). Hence the equalities. O 


Corollary 2. We have lim p(x) = lim r(x) = lim o(z) =A. 
r—+0o Hy oe XO. @) xL—0O 


Proof. For every i € N, choose from the finite set B(i) an element P; at which the map 

P ++ s(P) attains its minimum r(i). Then the family P = (P;), cq satisfies the conditions 

P, € B(t) and s(P;) = r(z) for all i € N. Hence, by Corollary 1, (3), lim T(z) = A. The 
Μη oo 


other equalities result from Lemma 4. 0 


Theorem 1. The following statements are equivalent: 
(ET): If A is a basis of N, then s(A) = οο; 1.6. A =o. 
(ETp): Jim p(x) = oo. 


197 


(ETr ): lim T(z) = 00. 


(ETe ): lim a(x) = oo. 


Proof. This is an immediate consequence of the above Corollary 2. 0 


83 ON THE GROWTH OF THE FUNCTIONS p, T AND a 


We will need an auxiliary function a: N—- N. For x Ε N and for a subset P of 
N, we first set a(P,xz) = max{r(P,n)+7r(P,n+2+4+1): n € N{z]}; we then define 
a(z) = min{a(P,z): P € B(z)}. 


Lemma 6. For any x € N, we have p(x) < r(x) < a(x) < min(27(x) , [(c +3)/2]), where 
[r] denotes the integral part of the real number r. 


Proof. For any P C N, we have a(P,x) = max{r(P,n)+7r(P,n+2+1):n € N[z]} 5 
r(P,k), ἴον Ὁ < k < 2x+1. It follows that a(P,x) > p(P,2x +1). Moreover, a(P,x) < 
max{r(P,n):n Ε N[z]}+max{r(P,n+2+1):n € N[z]} < p(P,2)+p(P,2r4+1) < 2s8(P). 
In particular, if P C N[z], then p(P, 2x +1) = s(P) and therefore s(P) < a(P,x) < 2s(P). 
Taking the minimum, as P ranges through B(z), of all sides in the latter inequalities, we 
get T(z) < a(x) < 2r(x). In addition, by Lemma 1, p(x) < r(x). This yields all the 
desired inequalities except one; so there only remains to show that a(x) < [(a + 3)/2]. 

Now, let m = [(x +1)/2] and A = N[m]. Then A € B(x) and thus a(x) < α(ά, 1). 
Moreover, a(A,x) < max{r(A,n) : n € N[z]} = p(A, 2x). Indeed, ifn > 1, thenn+2+1> 
2m and thus r(A,n +2 +1) = 0; while if n = 0, then r(A,0)+r(A,z2 +1) = r(A,z), 
since both sides are equal to 1 or 2 according as = 2m or 1 = 2m — 1 respectively. 
Furthermore, s(A) = r(A,m) = m +1, since it can be easily checked that r(A,n) =n+1 
if0 <n < mand r(A,n) = max(2m—n+1,0) ifn >m+1. Therefore a(x) < a(A,z) < 
p(A, x) < s(A) = m+1 = [(@ + 3)/2], which completes the chain of inequalities. [1 


Another concept that we need is that of generating power series. For every subset P of N, 
there is an associated formal power series fp(X) = ) cp X? = >, -0 X(P, n)X", where 
x(P, .) is the characteristic function of P, defined by x(P,n) = lifn € P and x(P,n) = Ὁ 1 
n €N\P. The square of this series gp(X) = fp(X)* = 0) r(P,n)X” is the generating 
series of the sequence (r(P,7))nen, since r(P,n) =  Σ ἈΞ X(P, k)x(P, n—k), for any n Ε N. 
For instance, if A = N[m], where m € N, then fa(X) = yi, X* = (1 -- X™*")/1 -- X) 
and therefore ga(X) = (1—X™+1)?/(1—X)? = (1-2X™t1 4 X2M+2) Oe (n+) X" = 
yep (Wt 1)Χ 5 + ee a y(2m —n+1)X", which gives the values of r(A,n) for alln <¢ N 
as noted in the proof of Lemma 6. 


Lemma 7. Let A, B be two subsets of N andd Ε N be such that A is finite andd > max A. 
LetC =A+d+*B={a+db:a€ A,be B}. Then 
(1) fo(X) = fa(X)fa(X®). 
(2) For any n € N, there exist unique integers 4,6 € N such that n = dq +e and 
0<e<d, and we have r(C,n) = r(A,e)r(B,q) + r(A,d+e)r(B,q— 1). 
(3) For any x EN, we have p(C,x) < a(A,d — 1)p(B, [z/d}) 


198 


Proof. (1) We have C = J,-4(a+d* B), where the sets a+d+ B= {a}+d*B are two by 
two disjoint. Indeed, if for a,a’ € A, with a < a’, there is some c € (a+d*B)N(a'+d*B), 
then c = a+db = a’ + db’, with b,b’ € B, and therefore 0 < a’ -—a = d(b— δ) < 
a’ < maxA < d, which is only possible if ὁ — b' = 0, ie. a = α΄. It follows that 
fol X) = Vaca Vocn X47 = Vaca X*fa(X%) = fa(X) fa (X"%). : 

(2) Squaring the relation in (1), we get go(X) = ga(X)gp(X%), ie. 9° r(C,n)X” = 
(rr P(A, n)X") Or» r(B,n)X). Identifying the coefficients of X” on both sides, 
for n Ε N, we get r(C,n) = ϑ Σὺ, τί, 7)γ(Β, Κ), where the summation is over all (j,k) € 
N x N such that 7 +dk = n. The latter sum can be restricted to 0 < 7 < 2m, where 
m = max A, since r(A,7) = 0 for 7 > 2m. Now, for a given n € N, the existence and 
uniqueness of g and e result from the Euclidean division of n by d. Since n = dq +e, with 
0 < e < d, the condition 7 + dk = n amounts to 7 = d(q — k) + e, with the restriction 
0 < d(q—k)+e < 2m < 2d, so that 0 < q—k < 2, ie. either k = q, 2 =e or 
k=q-1, j=d+e. Therefore r(C,n) = r(A,e)r(B,q) +r(A,d+e)r(B,q—1). 

(3) Let c = du+v, where u,v € N satisfy 0 < vu « d and are uniquely determined by 
Euclidean division. Then for 0 < n < z, similarly expressed by n = dq +e, with qg,e ¢ N 
and 0 < e < d, we have 0 < q < u, so that r(B,q—1) and r(B,q) are < p(B,u). Therefore, 
by (2), r(C,n) = r(A,e)r(B,q) + r(A,d + e)r(B,q— 1) < (r(A,e) + r(A, d+ e)) p(B, u), 
for 8110 < n < a. Hence p(C,x) < max{r(A,e)+r(A,d+e):0<e<d—I1}p(B,u) = 
a(A,d— 1)p(B,u), which is the stated inequality, since ὦ = [x/d]. O 


Lemma 8. Let xz,yEN, A€ B(x) and B € Bly), and let C= A+ (x+1)* B. Then 


(1) Ce Biry+aut+y). 
(2) ρ(Ο,τν - τ -- υ) <a(A,z)p(B,y). 
(3) s(C) < α(4,)5(8). 


Proof. 

(1) Letd=ax+1landz=sy+ax+y. Then C = A+d+*B and z = dy+z, with 0 < xz < d; 
furthermore 0 < max A < x < d. So, by Lemma 7, for any 0 < n < z, ifn = dq+e with 
0O<e<d,then0<q< y and r(C,n) = r(A,e)r(B,q) + 7r(A,d+ e)r(B,q — 1). Since 
A € B(x) and e < d—1 = g, then r(A,e) > 1 and since B € B(y) and q < y, then 
r(B,q) > 1. Therefore r(C,n) > r(A,e)r(B,q) > 1, for 8110 <n<z,ie. N[z] CC+C. 
Moreover, C = A+d*BcCN[z]+ (x +1) * N[y] C N[z]. Hence C € B(z). 

(2) By Lemma 7, and since d — 1 = x and [z/d] = y, we have p(C,z) < a(A,z)p(B,y). 

(3) Since C Cc N[z], we have s(C) = p(C,2z). But, by Lemma 7, we have p(C,2z) < 
a(A, x) p(B, [2z/d]) < a(A,z)s(B). Hence s(C) « a(A,z)s(B). O 


Theorem 2. For any r,y € N, we have 


(1) ρίων  α - υ) < a(x)p(y). 
(2) τίαν -ἰ α«, -Ὁ νυ) < a(zx)r(y). 


Proof. Let z=xy+z+y. By Lemma 8, for any A € B(z) and any B € B(y), there exists 
C εἰ B(z) satisfying p(z) < p(C,z) < a(A,2)p(B,y) and r(z) < s(C) < a(A,=)s(B). 
Hence p(z) < min{a(A,z) : A € B(x)}- min{p(B,y) : B € B(y)} = a(x) - p(y) and 
similarly r(z) < min{a(A,z):A€ B(r)}-min{s(B): B € B(y)} =a(z)-T(y). O 


199 


Corollary. For any x,y ΕΝ, we have 


(1) p(zy+x+y) < 2r(x)ply), and also p(xy + x+y) <[( + 3) 2]ρ(υ). 
(2) r(zy t+ x+y) < 2Ζτ(α)τί(ψ), and also τίτῳ τ Ὁ - υ) 3 [(ς + 3)/2]r(y). 


Proof. The inequalities follow immediately from Theorem 2 and Lemma 6. U 


Proposition. For any x € Ν᾽, if p(x +1) > p(x), then r(x) > p(z). 


Proof. Assume that p(x +1) > p(x). If B € B(z) is such that p(B,r) = p(x), then 
B € B(x +1); for otherwise, we would have r(B,x + 1) = 0 and therefore C = BU 
{x + 1} would lie in B(x + 1) and satisfy r(C,x + 1) = 2, so that p(x + 1) < p(C,x + 
1) = max (p(B, x),r(C,x+1)) = p(B,x) = p(x) < p(z + 1), in contradiction with the 
assumption. It follows that if B € B(x) is such that p(B,x) = p(x), then s(B) > p(x); 
indeed, B being in B(x + 1), we have s(B) > p(B,x +1) > p(x +1) > pla), by the 
assumption. Therefore, for all B € B(x), we have s(B) > p(x); indeed, this was proved if B 
satisfies the condition p(B, x) = p(x), and if it does not satisfy this condition we would have 
s(B) > p(B, xz) > p(x). We thus conclude that 7(x) = min{s(B) : B € B(x)} > p(x). O 


84 NUMERICAL RESULTS 


For x € N, a basis B € B(z) is called p-optimal (resp. r-optimal) if p(B, x) = p(x) (resp. 
s(B) = r(zx)); also, if B € B(#z) is such that s(B) = o(z), it is called a o-optimal basis 
of cardinality x. The set of p-optimal (resp. t-optimal) bases in B(x) is written O(p, x) 
(resp. O(7,z)), and the set of o-optimal bases of cardinality z is written O(o,x). To com- 
pute p(x), r(x) or a(x), we have to determine at least one corresponding optimal basis. 
But the number of bases grows so rapidly with z that exhaustive searches are impossible 
for large x. For instance, |B(15)| = 8134, while |O(p,15)| = 155 and |O(7,15)| = 102; 
also, {B(18)| = 63910. On the other hand, |B(#10)| = 47098. As to examples of 
such bases, we mention B = {0,1,2,5,8,11}, which is in O(p,13) ἢ O(r,13) N O(a, 6), 
and also B = {0,1,2,3,4,5,7,9, 11, 15, 21, 26, 34, 35, 39, 46, 54, 62, 72, 79, 89, 94, 101, 110, 
128, 137, 150, 153, 166, 182, 193, 206, 218}, which is in O(p, 223) N O(r, 223) N O(a, 33). 


Here are some values of the three main functions studied above that were obtained by 
computer calculations, using the software Maple. 


The p function. 
ρ(θ) =1 
p(x) = 2 ἴο 1 “1. Ξὅ 
ρ(α) = 3 ἴο 6 < x < 12 
p(x) = 4 for 18 < x < 55 
p(x) = 5 for 56 < x < 69 
p(x) = 6 for 70 < x < 233; and p(234) > 6. 


The τ function. 
7(0) Ξ 1 
T(x) =2forl<x2<4 
T(x) = 3 for5 <x < 10 


200 


The o 


T(z) =4 for ll <2 < 45 
T(x) = 5 for 46 < x2 - 59 
T(x) = 6 for 60 < x < 223; and 7(224) > 6. 


function. 

o(1)=1 

σ(α) =2for2<xr<3 
o(z)=3for4d<a2<5 

σ(α) =4for6<2<12 

o(xz) = 5for13<a2< 14 

o(x) = 6 for 15 < 2 < 33; and o(34) > 6. 


Theorem 3. We have A > 6; 1.6. for any Be B(N), we have s(B) > 6. 


Proof. This results from Lemma 4, the fact that the function p, or 7 or σ, is increasing, 
and from its highest calculated value, listed above. Γ] 


1. 


8. 


REFERENCES 


G. A. Dirac, Note on a problem in additive number theory, J. London Math. Soc. 26 (1951), 
312-313: MR 13,326b. 


. M. Dowd, Questions related to the Erdés-Turdn conjecture, SIAM J. Discrete Math. 1 (1988), 


142-150; MR 89h:11006. 

P. Erdés and P. Turan, On a problem of Sidon in additive number theory, and on some related 
problems, J. London Math. Soc. 16 (1941), 212-215; MR 3,270e. 

P. Erdés and W. H. J. Fuchs, On a problem of additive number theory, J. London Math. Soc. 31 
(1956), 67-73; MR. 17,586d. 

G. Grekos, L. Haddad, C. Helou, J. Pihko, On the Erdés-Turdn conjecture, to appear in J. 
Number Theory. 


. M. B. Nathanson, Unique representation bases for the integers, arXiv:math.NT/0202137 vl 


(February 14, 2002), 10 pages. 


. M. B. Nathanson, Generalized additive bases, Konig’s lemma, and the Erdés-Turdn conjecture, 


preprint (February 21, 2003), 8 pages. 
I. Z. Ruzsa, A just basis, Monatsh. Math. 109 (1990), 145-151; MR 91e:11016. 


PENN STATE UNIV., 25 YEARSLEY MILL RD, MEDIA, PA 19063, USA; E-MAIL: CXH22@psu.EDU 


The boundary structure of the sumset in Z? * 


Shu-Ping Sandie Han 
Department of Mathematics 
New York City College of Technology 
City University of New York 
shan@citytech.cuny.edu 


Abstract 


Let A be a finite subset of Z?. Let h be a positive integer. Let hA be a sumset 
defined by 


k 
{hya} +--+ +hpay | aj € A, So hi = h} 

i=1 
where k = [4]. It is found that the distribution of the elements of hA in the boundary 
region of the convex hull of hA exhibited a repeating pattern. In other words, if each 
side of the boundary region of the convex hull of hA is partitioned into h cells, for 
h sufficiently large, there exists a constant C and there exist a consecutive h — C 
congruent parallelograms such that the elements of hA in each parallelogram can be 
translated by a constant vector to obtain elements of hA in the next parallelogram. By 
counting the number of parallelograms and the cardinality of hA in each parallelogram, 
it can be found that the cardinality of hA in the boundary region is a linear function 
of h. 


1 Introduction 


Many studics have been done on the structure and the cardinality of sum of sects. In par- 
ticular, let A be a positive integer, and let A be a finite subset of Z", the structure and 
the cardinality of the h-fold sumset of A, denoted by AA, for sufficiently large ἢ can be 
approximated by studying the convex hull of AA. 


*supported by PSC CUNY Grant 


201 


D. Chudnovsky et al. (eds.), Number Theory 
© Springer-Verlag New York, Inc. 2004 


202 


It was found by Nathanson that when A is a set of integers, the structure of the h-fold 
sumset of A consists of an interval of consecutive integers and the cardinality of hA is a 
linear function of h. It was found by Khovanskii that when A is a finite set of lattice points 
in Z”, the structure of the h-fold sumset of A consists of a polytope such that all of the 
lattice points in the polytope are contained in hA. In other words, Khovanskii showed that 
there exists a positive real number p such that all lattice points in the convex hull of AA, 
whose distance from the boundary is greater than or equal to p, belong to hA. By using the 
volume of the polytope to approximate the cardinality of hA, Khovanskii showed that the 
cardinality of AA is a function of h”. 

The author examines the structure of sumset from a different perspective. Both Nathanson 
and Khovanskii studied the ’core” structure of hA. In other words, Nathanson and Kho- 
vanskii studied the distribution of those elements of hA that constitute the interior lattice 
points of the convex hull of hA. The author will study instead the ”boundary” structure 
of hA. The focus is on the distribution of those elements of HA that is less than a given 
distance away from the boundary of the convex hull of hA. As shown by Khovanskii, the 
cardinality of the elements of hA in the ’core” structure is a function of h”. It is conjectured 
that the cardinality of these ” boundary” elements of hA is h"~!. This paper will examine 
the case where A is a finite subset of Z*. The author will show that there is a repeating 
and consistent pattern in the ”boundary” structure of hA and that the cardinality of these 
”boundary” elements is a linear function of ἢ. 


2 Notation and example 


Let A;,,4 denote the convex hull of hA. When we speak of the boundary or the interior of 
ΠΑ. we mean the boundary or the interior of A,;,4. Let z,y,a,b be elements in ΗΖ. Let 
I(x, y) denote the line segment connecting the two points x and y. Let d(a,b) denote the 
distance between the two points a and b. Let the set 


{a ε R* | d(a,l(x,y)) = p} 


denote the set of all elements a such that a is exactly p distance away from the line segment 
I(x, y). Similarly, the set 
{a € ΗΠ | d(a,l(z,y)) < p} 
denotes the set of all elements a such that a is less than p distance away from the line 
segment l(z, y). 
If I(x, y) is a boundary line of Aya, the set 


{a € Ana | d(a,l(z, y)) < p} 


is also referred to as the ” boundary region.” The elements of hA in the boundary region are 
called the ” boundary elements” of AA. . 


203 


The following simple example in Z? will illustrate the regularity in the distribution of the 
boundary clements of AA. In fact, the example is so simple that there is a regularity in the 
distribution of the elements of hA throughout the interior of A, 4. 

Suppose A is a set of lattice points in Z? such that A contains only three elements. Let 
A = {0,a),a2}, where 0 = (0,0) and a, # kag for any real number k. Let h be a positive 
integer, then 


hA = {kya1 + koa | ky =0,1,...,h and ko = 0,1,...,h — ky} 
The cardinality of hA is 


(h+2)(h+1) h2 838 
| | 2 2 + 2 + 


The boundary of A;,4 consists of three line segments: 1(0, ha), [(0, haz), and l(hay, ha). 
Moreover, the boundary of Aj;,4 consists of these elements of AA: 
{kya | ky = 0, 1, ΝΣ h—1}U{kea2 | ko = 1, wey h—-1}U{k,a,+kea2 | ky = 0, 1. ΝΣ h, ko = h—k,} 
Thus, 

|hA NM boundary| = 3h 


A more interesting problem is to be able to examine not just those elements that lie on 
the boundary of μα, but also those elements of AA that fall within a given distance from 
the boundary. Define the following boundary regions with thickness p: 


y= Fi(h, p) = {a E€ Ana | d(a, (0, ha;)) < p} 
Fo = Fy(h,p) = {a € Ana | d(a,1(0, haz)) < p} 
FB, = F3(h, p) = {a € AnA | d(a, (μαι. haz)) < p} 


Let m,,™m2,m3 be three positive integers defined as follows: 


γι = max{n Ε Zt | d(nay,l(0,ha1)) < p} 
Mp = max{n € Zt | d(na,,l(0, haz)) < p} 
m3 = max{n € Z | d(ha; — nay, l(ha;, haz)) < p} 


For ἢ sufficiently large, the distance p from the boundary is small in comparison to the 
convex hull of AA, therefore, Fy ἢ FoN F3 = 9. The distribution of the clements of AA in the 
boundary region F; can be described as follows: 


h-m, 
FL QhA = LJ {κται + Κα} ko = 0,1,...,mi}U 
k,=0 
h 
ι {k\a, + kgaq | kp = 0,1,...,h -- ky} 


ky=A-m +1 


204 


h—-m2 
FaNhA = LU {kia + Κα | ky = 0,1,...,me}U 
ko=0 
h 
U {kya, + Κχαλ | ky = 0,1,...,h — ko} 
ko=h—m2+1 
h—-m3—-1 
U {kya + keag | kg = h—m3—k,...,h-ky}U 
ky=0 
h 
LU {kya1 + Και | kg =0,1,...,h — ky} 


ky=h-—mg3 


F3NhA 


Note that for 1 = 1,2, the set F; NAA consists of a disjoint union of ἢ — m; +1 sets, where 
the consecutive sets in F, NAA differ by a; and the consecutive sets in Fo OARA differ by ao. 
The union of the remaining sets in F; Ὁ AA has a constant cardinality. Thus, for 7 = 1, 2, 
cardinality of ἔς AA is 


(h —-m;+1)(m;+1)+4+ Σ h—k 


JF;Q RAL = 
k=h-m;+1 
= (h+1)m,— (mi = 2)(πι; + 1) 


2 


which is a linear function of h. 

Similarly, #3; AA consists of a disjoint union of ἢ — m3 sets where the consecutive sets 
differ by a; — aj. The union of the remaining sets also has a constant cardinality. Thus, the 
cardinality of F; NAA is 


h 
JFsORA| = (ἢ -- πι)πι3- δ᾽ h-k+1 
k=h—m3 
m3 — 2)(m3+1 


which is again a linear function of h. 
Furthermore, 
RAN (FN F)| = (m: +1)(my +1) 


The cardinality of hA in the boundary region can be computed: 


IRAN (FL UFZUF3)) = |RANF|+|hAN Fh] - ΒΑ Ἢ ΡΒ] 
— |RAN (FLO Fo)| — |hAN (Fh F3)| -- |hAN (F,9 F)| 
3 ;—2 4 1 
= δ - 1)πι; -- ma = me Ὁ Ὁ -- (αι; + 1)(m; +1) 
i=1 t<J 


Ch—D 


205 


Since m1, M2, m3 are constants, then (Οὐ and D are also constants where 


C = mtme+m;3 
3 
γι; — 2 m,+1 
Dom ye — mi + Dm, 


i=l 1.} 


Thus, the cardinality of hA in the boundary region is a linear function of ἢ. 


3 The distribution of hA in the boundary region 


Let A be a finite set of lattice points in Z*. Let h be a positive integer. 


Definition 3.1 An element of A is a vertex of A if it 1s the verter of the polytope formed by 
the convex hull of A. Stmilarly, an element of A 18 an interior point of A if tt is the interior 
point of the polytope formed by the conver hull of A. 


Let V4, denote the set of vertices of A. 


Definition 3.2 Let a and b be two vertices of a polytope. We say that a and b are the 
adjacent vertices if the line segment connecting a and ὃ is an edge of the polytope. 


Without loss of generality, we can assume that 0 € A. If there is an element a € A 
such that α is an interior point of A, then we can assume that 0 is an interior point of A by 
considering the set A — a. If A does not contain any element that is an interior point of A, 
then we can assume that 0 is a vertex of A by considering the set A — a where a is a vertex 
of A. 

Label the nonzero vertices of A in the following way, 


Va\{O} = {a: ἊΝ 


so that consecutive vertices are the adjacent vertices. If 0 ¢ V4, then a, is adjacent to aq. If 
0 € Vy, then 0 is adjacent to both a; and αι. However, a; and αἱ are not adjacent to each 
other. This particular case will be excluded from the proofs of Lemmas 3.1 and 3.2, and 
will be discussed separately in the proof of Theorem 4.2. Let A(0,a;,a;41;) be the convex 
hull formed by the three elements {0,a;,a;4,}. Partition the elements of A according to 
A\(0, a;, @j41) where αἱ, aj+; are adjacent vertices: 


A; = AN A(O, α;, ai41) fori =1,...,/-1 
Αι ΞΞ AN A(O, ay, a1) 


206 


Lemma 3.1 Let a;, a4 € Z?. Suppose z € A(0, aj, a;41)NZ2. Then there exist nonnegative 
rational numbers q' and q" such that 0 < q+ q" <1, and 


/ ul 
Z=Cat+q αἰ 


Proof. | | 
Let a; = (U1, Ue), G41 = (U1, V2), and z = (21, 22), where wy, U2, v1, V2, 21, Z2 are all integers. 
The solutions q' and q” to 

z=ag + aisi¢q" 


is the solution to the system of equations 
Zz = wg +r q" 


͵] " 
Z = 124 + vog 


The solution is rational. Moreover, since z € A(0, a;, aj41), z is contained in the set defined 
by 
{λχα; + A241 | M1; AQ E Rt U {0}, 0 < λ᾽ + AQ < 1} 


thus prove the lemma. O. 
By Lemma 3.1, for all a € A; C A, there exist nonnegative rational numbers g/, and q/! 
such that 0 < qi -- φ! <1 and 
a= G10; + αἰ αι 


Let 
Ma = min{m € Z* U {0} | mq), € Zt U {0}, and mg” ε Z* U {0}} 


Thus, there is a linear expression 
Ma = Ma; + Masi (1) 
where m, = mq, and m; = mqq; are nonnegative integer coefficients and 
Me 2 Ma(q’ + 47) 2 mM, + ma 


Since A is finite, let 


κι, 
nO max{T7.} 
k* = max |A,; | 
t=1,...,l 
N = max'||a|l 
acA 


Let 
e=m k*N 


For i= 1,...,/, define a subset of A;4 such that 


Κι Δ) = A(0, hay, ..., hajy_1, hajyt,...-, haz) 


207 


is the convex hull formed by the elements of μι 4U{0}\{ha;}. Define a subset of A(0, ha;, ha;+1) 


that intersects the boundary region of A; 4 as follows: 


Ai(h) = {x € A(0, μα;, hays) | d(z,K;U Κα) > e} fori =1,...,1-1 


Ai (h) = {x € A (0, μαι, haz) | d(x, Κι U Ky) > e} 


The following lemma considers a special property of an element in the boundary region 


of hA. 


Lemma 3.2 Let A be a finite subset of Z? containing 0. Let h be a positive integer. Suppose 
a; and a;4, are adjacent vertices. Fori=1,...,1, ifw Ee A,;(h)NhA, thenw—a; € (ἢ -- 1)4 


and w — aj41 € (R—-1)A. 


Proof. 
Define a subset Ai C A; C A, where 


A, = A;\l(0, ai) 


which does not contain those elements of A; that lie on the line segment /(0, a;). 


As an element of AA, w can be represented in terms of the elements of A: 


w= δ᾽ haat Σ᾽ haa 


αξ Α ac A; 
ag A’. 


where ogc4 ha = ἢ. Let 


w= δ hea 
ac€A 
ag A’. 


Then w’ € K;,(h). Since ὦ € A;(h), by the definition of A;(h), 


Jw-—w'| = [δ᾽ haa| > € = m*k*N 
ace At. 
But | 
ΣΊΜΩΝ > Σ᾽ halal > | So heal > € = m*k*N 
ac Al ac A; ac At 
This implies that 
So μὰ > m*k* 


i 
aca; 


208 


Since k* > |A‘| for all ὁ, there exists an a € A; such that hg > m*. Thus, by equation (1), 
haa = (hy — Ma)a+ mi αἱ + Miaiss 


where m, > mi, + ml! and mi, and m{ are both nonnegative integers. Furthermore, mj, γέ 0, 
because a € A‘. This proves that ὦ — ai41 € (R—-1)A. 
Similarly, let 
Aj = Aj\1(0, ai41) 


It can be proven that w—a; Ε (h-1)A. 0 


Let p be a nonnegative real number. Define 
Li(h, p) - {τ Ε Δίρ0, μαι, haj+1) | d(x, l(hai, haj+41)) - ρ) 


to be the line that intersects A(0, μα;, ἤα;..1}., parallel to ἰ(μα;, haj:,;) and is equal to p 
distance away from |(ha;, ha;4,). Define the boundary region of A(0, ha;, μα;..1) as: 


F;(h, p) = {x € A(0, ha;, haji) | d(x, l(ha;, haiz1)) < p} 


which is the region that contains all those elements of A(0, ha;,ha;,1) that are less than p 
distance away from the boundary line /(ha;, ha;41). 

For small h, Fi(h, p) Ὁ Δι,(Ἀ) = A;(h). For h sufficiently large, Fi(h, p)N Ai(h) Z Aj (A). 
In other words, there exists an element x € A;(h) such that x ¢ F;(h, p). 


Lemma 3.3 Let h be a positive integer. For ἢ sufficiently large, there exist positive integers 
M, and N,; independent of h such that for all real numbers r such that M; <r<h—WNi, 


{raj t+tay,|tER, h-6;-r<t<h-r}cA,(h) 


where 


δι = (δε R* | (ἢ -- dais € li(h, p)} 


Proof. . 
Define for all positive integers h, 


Ay(h) = min{A € R | λα; + (h — A — 6;) aig, € 4,(}}} 


Let h’ be a positive integer, and let u(h’) = Ay (h')a; + (δ — Ay (h’) — 6;)aj41, then u(h’) is an 
intersection of 1;(h’, 9) with the boundary of A;(h’). Since the line J;(h’, p) is parallel to the 
line [(h’a;, h’a;41), and the boundary of A;(h’) is parallel to the boundary of K;(h’), the four 
lines form a parallelogram. Furthermore, u(h') and h’a;,; are the opposite vertices of the 
parallelogram. Let h” be a positive integer different from h’. Define u(h") = A\(h")a;+ (h" -- 
A, (h") —6;)aj4,. Then u(h") and h"a;,1 are the opposite vertices of the parallelogram formed 
by the lines 1;(h", p), l(h"a;, ha;,,), and the boundary lines of A;(h"), and K;(h"). Since 


209 


l,(h, p) is p distance away from [(ha;,ha;4,) and the boundary of A;,(h) is ε distance away 
from the boundary of K;(h) for all positive integers ἢ, the two parallelograms are congruent. 
Thus, the corresponding diagonals of the parallelograms, u(h’) — h’'a;4, and u(h") — δ ας, 
are equal. Hence, for all positive integers h’ and h”, 


u(h’) —_ Πα; ει ΞΞ μ( - μ΄ α;.ι1 
λι(μη)α; + (RY -- λγ(μ7) -- δι)αρει — A" ays Ar (R')a; + (δ΄ -- λι (A) -- δῆγαμει — h'aigs 
Ai(h") (ai — α;.1) — δια Ai(h') (a; — a:41) — διαμεῖ 


This implies | 
Ai (h’) = λι( ΞΞ At 


is a constant independent of h. Let M; = [A,| the least integer greater than λι, then M; is 
a constant independent of h. 
Similarly, for all positive integers h, define 


Ao(h) = min{A E R | (ἢ -- λ)α; + (A — di)aiz1 € Ai(h)} 


It can be shown again that for two different positive integers. h’ and ἢ“, the parallelogram with 
opposite vertices (h! — Ao(h'))a; + (A2(h’) — 6;)a;41 and h’a; is congruent to the parallelogram 
with opposite vertices (ἢ — A2(h"))a; + (Ao(h") — 6;)ai41 and h”a;. Thus, 


(μ" - Ag(h")) a; + (A2(h") — 6; ) O44 " h" a; (h' - A2(h'))a; + (A2(h') - 0; )Qj44 - h'a; 
A2(h")(aiz1 — αἱ) -- δια ει Aa(h')(ait1 -- ai) -- δια ε 


This implies 
A2(h’) = r2(h") = r2 


is a constant independent of ἢ. Let Ν᾽; = |A2| the greatest integer less than Az. So Nj; isa 
constant independent of h. 
Moreover, for all real numbers r such that M; <r <h—WNi, 


{ra;+ta,|tER, h-6,-r<t<h—-r}cA,(h) 


Let δ; n(h) € U(ha;, haj41) be determined by n and ἢ as follows: 
bin(h) = πα; + (ἢ -- n)aizi 
For each nonnegative integer n, such that 0 <n < ἢ —1, define a subset of ΒΗ“: 
Bin(h) = {bin(h) — ται + tag(a; — ai41) | O< 7 «ὃ, O< t <1} (2) 


For Mj <n<h-—WN,-—1, 
Bin(h) Cc Fi(h, p)N Ai(h) 


210 


The following lemma shows that for all nonnegative integers, m and n such that 0 < 
m,n < h— 2, the elements of B;,,(h) are the elements of B;,(h) translated by special 
elements of AA. This will play a role later in the article when we consider the boundary 
elements of hA. 


Lemma 3.4 Let h be a positive integer whose value ts fixed. For all nonnegative integers n 
such thatO<n<h—-2, Binsi(h) = Bin(h) + a; — ai4t. 


Proof. 
Let x = bin (h) — σαι + t(a; — Qi41) € Bin(h) for some r and t such that 0 < r < 6;, and 
0<t<1. Then 


x = bjn(h) — rain, + ἐ(α; — aj41) 
= nat (ἢ -- γαμεῖ — raja + (ας ει — aj) 
t+aj—ay1 = (n+ 1)α; τ (ἢ -- (ἡ -Ῥ 1))α; ει — ragy1 + ἑ(αρ ει — aj) 


ΞΞ δι n4i(h) - ΤΟΣ ΕἽ + t(aj41 "“ αι) 


By definition, x + αἰ -- αι Ε Binyi(h). Thus, Bin(h) + αἰ — αι C Binyi(h). 
Conversely, if ας € Byn4i(h), then 


Φ = bina lh) — raiyi + ta; — a441) 

(n+ 1)α; + (h — (n+ 1))aiga — Tagg, + t(a; -- aj41) 
na, + (h — niga — ray4, + ἕ(α; — Qin.) + αἱ -- αἴ. 
= binl(h) — σαι + tla; — ai41) + αἱ — A541 


Let y = bi n(h) — TAi41 + t(a; - ας 61) € Bin(h), sox = Yt Qj — Qi41 € Bin(h) αι — Aji41- 
Thus, Bingi(h) C Bin(h) - αἱ -- aj41. Therefore Bynii(h) = Bin(h) + a; -- αμρι. O 


The following lemma shows that for ἢ sufficiently large, the set B,,(h+1) isa translation 
of the set B;,(h). 


Lemma 3.5 For all nonnegative integers n such thatO <n <h—-1, Bin(h+1) = Byr(h)+ 
Ai+1- 

Proof. 

Again, let x € B;,,(h), then for some real numbers r and t such that 0 <r < 6;,,0<t< 1, 
we have 


ec = θιπ() -- ray, + tla; — ai41) 

na; + (h — n)aia1 — raga, + t(a; — aj41) 

na; + ((ἢ - 1) — n)aigy — rays, + t(a; -- aj41) 
= διπί -Ὁ 1) — rays + tay — ays) 


| 


| 


E+ ἀμ 


211 


Thus, XL + Qi41 € Bin(h + 1) implying Bin (h) + Qj41 C Bin (h + 1). 
Conversely, if x € B;,(h +1), then 


r= bin(h+1) — rai, + t(aj — aj41) 

= παι + ((h+ 1) — π)ας μι — raig, + t(a; — aj41) 
na, + (ἢ — n)aiga — rain, + t(a; — a41) + αἴ41 
bin(h) — raig1 + t(a; — ay41) + Qigt 


There exists an element y = b;.,(h) —raj41+t(a;—ai41) € Byn(h) such that « = yt+a,44, thus, 
τε Bin(h)+ai41 implying Byp(h+1) C Bin(h)+aj41. Hence, Bin(h+1) = Bin(h) +4441. 
[] 


Define 
M,;-~-1 


Byr(h) = U Bin(h) (3) 


B, 1 (h) 


( ᾿ Piatt) lb (4) 


n=h—WN; 


The set B;,7(h) represents the area of the first cell which is bounded by the lines I(ha;, haj+1), 
li(h, p), (0, hajy1), and ἐς μ, (ἢ). On the other hand, B;,(h) represents the area of the last 
cell which is bounded by the lines [(ha;, haj+1), li(h, p), li,n,(h), and 1(0, ha). 


The next two lemmas consider the first and the last cell separately, but prove the same 
result: the ἢ + 1 level cell is the translation of the ἢν level cell, hence the cardinality of the 
cell is independent of h. 


Lemma 3.6 Birth + 1) = By r(h) + Qj+1- 


Proof. 
iFrom equation (3), 


M;-1 
U Bin(h) + ἀμ 
n=0 
M;-1 
n=0 


ΞΞ By r(h + 1) 


Bir (h) + aigi 


Thus proves the lemma. 0 


212 


Lemma 3.7 Bur (h + 1) = By, (h) + Qj41 


Proof. 

Suppose x Ε B;,,(h), then x € Fj(h, p) and z € B;,(h) for some nonnegative integer n such 
that ἢ - Ni; <n <h-—1. By Lemma 3.5, it has been shown that x + αι € Βι π(ἢ t+ 1). 
To complete the proof, we need to show that z+ aj4, € Fi(h + 1,p). Let w be a point 
on the line l(ha;, haj41), such that jw —- z| = D < p. Since ὦ + aj;4, is a point on the line 
I((h + 1)α;, (ἢ + 1)α.μ1), then 


d(x + aja1,1((h + 1)α;, (ἢ + 1)α...1}} < d(w + ini, 2 + Gi41)| = lw-—2z] = D< p 


Thus, x + a;4; € F;((h + 1), p), therefore x + aj41 € By (ἢ +1) implying B; (hk) + ain. C 
Birth + 1). 

Conversely, let z € By r(h +1), then z € Fi(h+1,p) and x € B;,(h+1). There exists 
an element y € Bj; »(h) such that « = y+ a;41. Using the similar argument as before, it can 
be shown that y € F;(h, p) and y € B, (hk), thus implying z € B;,(h) + aj41. Hence, prove 
the lemma. 0 


Let S;n(h) denote the subset of AA that belongs in B;,,(h). Let U;(h) and V,(h) denote 
the subset of hA that belong in the first and the last cell respectively, thus, we have the 
following definitions: 7 


Sin(h) = μά ἢ Β, κ(Ἀ) for Μ, <n<h-N,-1 (5) 
Ui(h) = RAN Bi r(h) | (6) 
Vi(h) = hAN Bi 1(h) (7) 


The next two lemmas consider the elements in the boundary region of hA and prove that 
the cardinalities of A in the cells B;,(h) are equal and independent of ἢ. 


Lemma 3.8 For all nonnegative integers n such that M; <n<h— Ν, -- 2, 


Sin4i(h) 
[Sinti(h)| 


Sin(h) + a; -- Qi+1 and 
|Sin(A)I- 


| 


Proof. 

Suppose x Ε S;,(Ah), then τ € hA and z € B,,(h). Lemma 3.4 implies that x + a; — ai41 € 
Bin+i(h). There left to prove that + + a; — ai41 € AA. Since Bi κ() C (Fi(h, p)N Δ,(})) 
for all nonnegative integers n such that M; < n < h-— Ν᾽ —1, so x € A;(h). Moreover, 
x € A;(h) NAA. By Lemma 3.2, both x — a; and x — a;4, are in (ἢ -- 1)4. Thus, 


£+ a; — G41 = (© — ai41) $a, Ε (R-1)A+A=hA 


(213 


Therefore, 2 + a; — @j41 © Singi(h) and thus S;,(h) + a; -- ai4. C Sinail(h) 

Conversely, ἔτ € Sin4i(h), by definition, z € hA and x € B;n4,(h). Lemma 3.4 shows 
that there exists an element y € B;,(h) such that αὶ = y+ a; — aj41. Let y= x -- αἱ + G41. 
Again, x € Aj(h) since Binii(h) C Fi(h,p)NA,;(h), so by Lemma 3.2, both xz — a; and 
X — Qi4; are in (ἢ —1)A. Therefore, 


y=I-—a4+ai41 = (ὦ -- αἡ τ αμι €(h-1J)A+A=hA 


implying y € S;,(k) which implies x € By ,(h)+a;—aj41. Hence, Singi(h) C δὲ κ(Ἀ}) - α; -- 
Qi+1- Hence, Singi(h) = Sin(h) + a; — aj41. This also proves that |S;n41(h)| = |Sin(h)|. ΠΕ 


Lemma 3.9 For all nonnegative integers n such that Mj; <n<h-—WN;—1, 


Sin(h + 1) 
δὲ π(ὶ + 1)| 


Sin(h) + @i41 and 
|Sin(h)]. 


Proof. 
Result follows directly from Lemma 3.5. O 


Lemma 3.10 
U(h+1)=U,(h)+ai4, and Vi(h+1) =Vi(h) + aj41 
Furthermore, 
JUi(h+1)| = [Ui(h)| and |Vi(h+1)| = |Vi(r)]- 
Proof. 


Refer to equations (6) and (7) for the definitions of U;(h) and V;(h). The result follows 
directly from Lemmas 3.6 and 3.7. O 


The interesting finding of this research is that for h sufficiently large and for all M; <n < 
h—N,;—-1, the cardinality of hA in each set S;,,(h) is a constant independent of h and n. The 
cardinalities of U;(h) and V;(h), the first and the last cell of the partition, are also constants 
independent of ἢ. Thus the cardinality of hA in the boundary region of A(0, ha;, ha;+1) is 


|Sin(h)| - (number of cells ) + |[U;(h)| + |Vi(A)| 


The only component of the above expression that depends on A is the number of cells which 
is determined by M; and N,. 


214 


4 Main theorems on the distribution and the cardinal- 
ity of the boundary elements of iA in Z’ 


The following theorem proves the special case by considering the pattern of AA in the bound- 
ary region of A(0, ha;, haj41) where a; and a;,1 are two adjacent vertices of A. 


Theorem 4.1 Let A be a finite subset of Z* containing 0. Suppose a; and a;4, are two 
adjacent vertices of A. Let h be a positive integer and p a nonnegative real number. Define 


F;(h, p) = {x € A(0, ha;, haj+1) | d(x, l(ha;, hajz1)) < p} (8) 
Then for h sufficiently large 
| Fi(h, p)OhA| = Cy-h—-D; 
where C’; and D; are some constants. 


Proof. 
Let M; and N; be two constants independent of h, determined by the intersection of the 
line 1;(h, p) with the boundary of A;(h) as in Lemma 3.3. Define B;,(h), Sin(h), Ui(h), and 
V,(h) the same as in equations (2), (5), (6), and (7). 
For ἢ sufficiently large, 
h-Ni-1 
Fi(h,p)QkRA = (J Sin(h) U U;(h) ὦ μα) 
i=M; 

where the intersection of any two sets in the union is empty. Thus the cardinality is 

h—-N,-1 

[Fi(h, p) ORAL = a [Sim(h)| + [Ui(h)| + |ViCA)| 

Let Οἱ = |Sin(h)|. Then Οὐ is a constant independent of πὶ and h by Lemmas 3.8 and 3.9. 
Furthermore, the cardinality of U;(h) and V;(h) are constants independent of n and h by 
Lemma 3.10. 


Therefore, the number of elements of hA that are within p distance away from the face 
l(ha;, haj+1) 1S: 


h-N;-1 
IFA, p)OhA| = Σ᾽ [Sin(h)| + [Uilh)| + [Vi(r)| 
i= M; 
= (h-N,— Μὴσ; + | Ui(h) | + | Vi(h) | 
= C; -h—- D; 


where D; = (M; + Νὴ “ C; - |U;(h)| _ [V;(A)|. 0 


By using the result of Theorem 4.1, Theorem 4.2 proves the general case. 


215 


Theorem 4.2 Let A be a finite set of lattice points in Z*. Let OA;,4 denote the boundary 
of Ana. Leth be a positive integer. Define 


Fa(h, p) = {x € Ana | d(x, OAna) < p} 


to be the region that is less than p distance away from the boundary. Then for h sufficiently 


large 
|hAN Fa(h,p)| = C-h+D 


where C and D are some constants independent of h. 


Proof. 
Case 1: A contains an element that is in the interior of the convex hull of A. 

Without loss of generality, we can assume that 0 € A and that 0 is in the interior of the 
convex hull of A. Since we can consider the set A —a, where a € A is an interior element of 
the convex hull of A. 

Let V4 be the set of vertices of A. Let 


Va = {ai}i-y 


where | = |¥4|. Assume the consecutive vertices are the adjacent vertices, and a; and αἱ are 
the adjacent vertices. Define 


Fy(h,p) = {x € A(0, ha;, haj4i)\l(0, haj41) | d(x, l(haj, hajy1)) <p} fori=1,...,l-1 
Fi (h, p) {x € A(0, μαι, ha;)\l(0, haz) | d(x, l(har, hay)) < p} 


apply Theorem 4.1 to each set A(0, ha;, haj41) and A(0, ha, ha,), thus, for 81] ὁ =1,..., 0, 


ΒΑ Ὁ F(A, p)| = Cih— Dj 


where C; and D; are constants. 


Ti(h) = {n © Bt | d(naiyr, haiy1) <p} 


Since the cardinality of U;(h) and V;(h) are independent of h, the cardinality of 77(h) is also 
independent of ἢ. Therefore, 


l 


MAN Fa(h,p)| = Yo|hAN Fh, o)| ~ DITA) 


= Ch—D 


216 


where | ) 
L 1=1 


Case 2: A does not contain an element that is in the interior of the convex hull of A. 
Without loss of generality we can assume 0 € A and 0 is a vertex of the convex hull of 

A. Since we can consider the set A — a where a € A is a vertex of the convex hull of A. 
Let V4 be the set of vertices of A. Let 


Va = (αι U {0} 


where 1 = |V4\{0}|. Assume the consecutive vertices are the adjacent vertices, and 0 is 
adjacent to both a; and αι. Define 


Fi(h, p) = {x € A(0, μα; haiy1)\U(0, μα; 61) | d(x, ἰ(μα;, haz41)) < p} fori=1,...,l-1 
apply Theorem 4.1 to each set A(0, ha;, ha;+1), thus for alli =1,...,1 —1, 
|hAN F;(h, p)| = Ch — D; 


where C; and D; are constants. 
Define the remaining two boundary regions of A; in the following way: 


Fo(h,p) = {x € Δί, hay, hag) | d(x,1(0, hay)) < p} 
Fi(h,p) = {x € Δί, hai, μαι) | d(x, [(0, ha;)) < p} 


To be able to apply Theorem 4.1 to the case of [8 ἢ Fo(h, p)| and |hAN Fi(h, p)|, we will 
translate the set A by ag. Consider the set A’ = A—ap and let Ay = a, — a2 = 0; a, = αἱ — αἱ; 
A = ἀρ — Ag; and a3 = αἱ — a2. Applying Theorem 4.1 to {0, αἴ, a4} and {0, αὐ, a3}, we have 


JRA Fo(h, p)| = [RAN Fo(h, p)| 


and 
JAAN Fi(h, p)| = [RAI Fi(h, p)| 


Putting it all together we have for i = 0, and l, 
IRAN Fi(h, p)| = [Α΄ Ὁ Fi(A, p)| = Ch — D; 


Let 


T,(h) 
Ti(h) 


hAN Fith, p) N Fy4i(h, p) = U(h) NV; (h) fori=0,1,...,/—1 
hAN Fi(h, p) M Fo(h, p) = Ui(h) + Vo(h) 


Ι 


By Lemma 3.10, the elements of U;(h + 1) and V;(h +1) are simply the translation of the 
elements of U;(h) and V;(h) by aj41. Thus, the elements of T;(h + 1) are also the translation 


217 


of the elements of T;(h) by a;41. Since the cardinality of U;(h) and V;(h) are independent of 
h, so is the cardinality of T;(h). The cardinality of hA in the boundary region is: 


|hANFa(h,p)| = DORAN Fhe) 1-2 [Ππ}}} 


1=0 1=0 
l 
= SC,-h-D, -|T,(h)| 
4=0 
= C-h-—D 


where ὦ = aan C; and D= yi -9 D,; + |T;(h)|. O 


Define 
Ana(p) = {x € Ana | d(x, 0An«) > p} 


to be the set of elements of A;,4 that is greater than or equal to p distance away from the 
boundary of μα. 


Theorem 4.3 (Khovanskii) Let A be a finite subset of Z" such that A generates 2". Then 
there exists a constant p with the following property: for an arbitrary natural number h, every 
lattice point of the polytope Ana(p) belongs to ΠΑ. 


Theorem 4.4 Let A be a finite subset of Z? such that A generates Z*. For all sufficiently 
large positive integers h, there exists a nonnegative real number p independent of h such that 


hA = (Ana(e)1Z?) ὦ (Fa(h,p) VA) 
where |F4(h, p) ΓΟ ΠΑ] is a linear function of ἢ. 


Proof. 
Let p be the real number in Theorem 4.3. Thus Δμα(ρ) ἢ 22 CAA. For all ἃ sufficiently 


large, 


hA = (Aj, 4(p) ‘a hA) U (F'4(h, p) ΠῚ hA) 
= (Ana(p)M Z*) U (Fah, p) ORA) 


By Theorem 4.2, |F'4(h, p) MN RA| is a linear function of ἢ. O 


In conclusion, Theorem 4.4 has presented a general structure for hA where A is a finite 
subset of Z2. The general structure is this: There exists a nonnegative real number p such 
that all lattice points whose distance from the boundary of Aj, is greater or equal to p 
belongs in the core” structure of hA, and those lattice points whose distance from the 


218 


boundary is less than p belongs in the boundary region. The distribution of these boundary 
elements of hA exhibited a regular pattern since they are related by a translation of an 
element. Khovanskii had shown that the cardinality of those elements of hA in the ”core” 
structure is a function of h” where n is the dimension of the space, thus in the case of Z?, 
the cardinality is h?. The author had shown explicitly by using simple counting principles 
that the cardinality of the boundary elements of hA is h"~!, which in the case of Z? is a 
linear function in h. 


References 


[1] G. Ewald, Combinatorial Convexity and Algebraic Geometry, volume 168 of Graduate 
Texts in Mathematics. Springer-Verlag, New York, 1996. 


[2] S. Han, C. Kirfel, and M. Nathanson, ”Linear forms in finite sets of integers,” The 
Ramanujan Journal. 2(1998), pp.271-281. 


[3] A.G. Khovanskii, "Newton polyhedron, Hilbert polynomial, and sums of finite sets,” 
Funksional. Anal. Prilozhen. 26 (1992), pp. 276-281. 


[4] M.B. Nathanson, Additive Number Theory: Inverse Problems and the Geometry of Sum- 
sets, volume 165 of Graduate Texts in Mathematics. Springer-Verlag, New York, 1996. 


ON NTU's IN FUNCTION FIELDS 
Howard Kleiman 


1. Introduction. Let g(x,y) be a polynomial of degree n in x with 
coefficients in z[{y] which is absolutely irreducible. Suppose 
that the coefficient of x" is c' while g(0,y) = c with c, c' € Ζ. 
Let b be a root of g(x,y). Then b is an NTU (non-trivial unit) of 
F = Q(y) (b)/Q(y). We can generalize Hilbert Class Field Theory 
using NTU units. Also, by elementary algebra, there are but a 
finite number of solutions of q(x,y) = 0 in Q X Z. We can 
explicitly obtain them. Furthermore, using a computer algorithm, 
we can explicitly obtain solutions in Q X Ζ of a large number of 


Minimal equations of F. 


2. THEOREM. Let f(x,y) = 0 be a polynomial of degree n with 
coefficients in zZ[y] defining the field Lh = Q(y)(a)/Q(y). Assume 
that f(x,y) is absolutely irreducible. Suppose that 

(1) £(0,y) factors into cq(y) in Z[y] where le I> 1, 
c € Z, c is not a perfect n-th power in z, and q(y) is monic and 
irreducible; 

(2) there exist rational integers y' and y'' such that both 
£(0,y') amd £(0,y'') are irreducible over Q with indices 
relatively prime to ¢c; 


(3) £(0,y') and £(0,y'') have c as their gcd; 


219 


\ 


D. Chudnovsky et al. (eds.), Number Theory 


© Springer-Verlag New York, Inc. 2004 
OU OU 


220 


(4) £(0,y*)/c (y* = y', y") is relatively prime to c. 
(5) If a' = a(y') and a'' = a(y''), the class fields of 


L' = Q(a') and L''= Q(a'') have Q as their intersection. 


Then L contains an NTU. 


Proof. Either a is divisble by an NTU, u, in ut whose absolute 
norm is c, or else there exists an algebraic integer A, unique to 
within an algebraic unit, which divides a = a(y) for every 
specialization, y*, of y into a rational integer. In the latter 
case, if conditions (1) - (4) hold for y = y*, the ideal 

(c, a*) = (A)* is a factor of (cq(y*)), y* € z, . Therefore, (A)* 
is contained in the class field of u* = Q(y*)(a*) (y* = y', y''). 
implying that their respective class fields both contain (Δ) as 
a subfield, contradicting (5). Therefore, a is divisible by an 
NTU u (not necessarily in L). But, from (1), q(y) is an 
irreducible element of Q[y]. Therefore, (a, q(y)) is a prime 
ideal of degree 1, that is, a principal ideal in 0, . Therefore, 
(a, @d(y)) = (a). (a) is representable by the ideal number ua 
where u is a unit in uw. u is either an element of g or an NTU. 
But its absolute norm is + c which isn't a perfect n-th power. It 


follows that u is an NTU. 


Continued Fractions and Quadratic Irrationals 


Joseph Lewittes’ 


Department of Mathematics and Computer Science 
Lehman College - City University of New York 
Bronx, NY 10468 


lewittes@alpha.lehman.cuny.edu 


Most books on number theory contain a chapter or two on continued fractions, 
which really are indispensable in a number of areas. Nevertheless, they have not 
achieved a mainstream popularity and are often omitted in courses on number theory. Of 
course there are reasons for this; their basic construction strikes one as rather bizarre and 
they are notoriously impossible to manipulate with respect to the usual operations of 
arithmetic. Furthermore, they have no satisfactory generalization to fit into a more 
comprehensive framework. But, they are surprising and interesting. One of the key roles 
played by continued fractions is in the construction of units in real quadratic fields. In 
studying this topic we have found that the entire theory of units in such fields can in fact 
be derived via continued fractions. Also, the periods of the continued fractions associated 
with quadratic irrationals exhibit a certain pattern or structure when classified by 
discriminant, which has not been previously noted. 

In presenting these results I am also attempting an introductory exposition of 
continued fractions. This paper is meant to be self-contained in the sense that all basic 
terms will be explained; specific facts quoted from the literature can simply be accepted, 
for the moment, without impeding tk2 reader’s progress. Our focus is narrow, 
concentrating on the continued fractions themselves, omitting any discussion of their 
intimate connections with ideal theory and binary quadratic forms. The following 
theorem may be considered an ultimate goal; eventually its statement will become 


meaningful, numerical examples will be given and perhaps the reader will be intrigued. 


‘Partially Supported by a PSC-CUNY Research Grant. 


221 


\ 


D. Chudnovsky et al. (eds.), Number Theory 


© Springer-Verlag New York, Inc. 2004 
OU OU 


222 


t+uvd 


Theorem Let d be a discriminant and δ = “ΠΝ , the fundamental unit. Let x 


be a reduced quadratic irrational with discriminant @, having the purely periodic 


continued fraction [δ...δ᾽.....δι. 4]. Let e be the number of terms 34, that are even. Then 


the parity of e depends on d only, as follows: 
If d is odd, then e is even. If dis even, then e= u (mod 2). 

We formally designate as theorems those statements that contain new results. A 
standard reference on continued fractions in Perron’s book [5]; this 1s the third edition. 
The classical results that we cite are all found there in the first three chapters. A recent 
book in English is [6]. For more on orders in quadratic fields and binary quadratic forms 
see [2], chapter 2 and [1], a more leisurely approach. But these do not integrate continued 
fractions into their discussion. While [4] does - see, in particular, the preface - it treats 


only the case of field discriminants. 


1. Finite Continued Fractions 

Continued fractions may be motivated by the Euclidean Algorithm, which we 
recall with a simple example. To find the greatest common divisor of 26 and 7 (denoted 
gcd(26,7) or just (26,7)) one performs successive integer divisions yielding quotient and 


remainder to obtain: 


26=3x7+5 
7T=1x5+2 
5=2x2+4]1 
2=2x1 


The last non-zero remainder, 1, is (26,7); the quotients generated, 3, 1, 2, 2, underlined 
above, seem to be irrelevant. But, waste not want not! Dividing both sides of each 


equation, save the last, by the divisor, they become: 


76 345 iia 224! 
7. Πτ a 2 
By successive substitutions one then obtains 
26 4, 
τ =34+—5 (1) 
1 - 
2 --- 


223 


26, . ι, . 
and — is now presented as a complicated compound fraction, but using only the 


underlined quotients, and a 1 for each numerator. Such an expression is called a (finite) 


continued fraction. In general, if b,,b,,...,5,_, are any numbers then 


b+ -------- (2) 


] 
δ, + , ΕΝ 
+ ee, 
u-2 b,_, 
is called a continued fraction. Note that n21 and if n=1 we have only 5, - but it still 
qualifies technically as a continued fraction. We refer to ,,0,,...,5,_, as the terms of the 
continued fraction (the » terms are indexed from 0 to n-1) though in the literature they are 
called the ‘partial quotients.’ (2) is too bulky so we write it in condensed form 
[6.6 νον, |. As long as 6,,...,5,_, are positive real numbers, which we assume from 


now on, though 4, may be <0, (2) has a well-defined numerical value, again designated 
26 , 
by the symbol [b..8, er ee |. Thus (1) shows > = (3,1,2,2]. Since 2=1+7, 


26 


τ - [3.1,2,1,] also. 


26 
What worked for 7 shows that any rational numbei x can be expressed as a 


continued fraction. Write x =—,c, d integers, d>0, not necessarily relatively prime, and 


d 


apply the successive divisions of the Euclidean Algorithm starting with division of c by 
d. If n lines are produced with the quotients being 5,,5,,...,5,_,, then x= [2,.5, an 
Note that if x is not an integer then n>1 and the last term 5, _, > 1 so that also 


x= [2,.2, py Dy a 1,1]. If x is an integer then n=1, the one line of the algorithm reads 


bat | 
c=6d,with b =x,so x= (2, | = [ - 1,1]. In particular, every rational x has two 
continued fraction representations, one having an even number of terms and the other an 


odd number.. This observation plays an important role later on. 


224 


The reader will have noticed that we’ve been tacitly assuming that the terms ὁ, 
are integers. This indeed is the case, but later on we will want to use the option of having 
the last term 5,_, not an integer. So this should be kept in mind, but from now on, unless 


otherwise noted, the terms of the continued fraction will be integers only. 


Actually, the two ways shown to represent x as a continued fraction are the only 


; ; 1, ; 
ways. This can be seen as follows. First note that if b, +-——-—~ is any continued 


(δ. Ὁ ..) 


fraction with three or more terms then its value is not an integer, since (ὁ. Ἐ..}}1; also 


δι += is not an integer if b, >1. Thus the only possible continued fractions for an 


δ 


integer x are those two given above. Suppose now x not an integer and [eo .¢ snes ya | 15 
. . Ϊ 
any continued fraction whose value 15 χ. Then x =c,+—, y= [c, yee Cy] and ν»] 
y 


since x not an integer. Thus c, =| x]. Here, for any real number f¢, [¢| denotes the 
greatest integer <f, the unique integer k, such that k<t<k+1. We use this notation 


rather than the customary [1] to minimize confusion with the square brackets used for 
continued fractions. If also x = la, 6a, ραν] we have a, =Lx]=c,, hence 
[a, ey | = [e, yore lu y |. If this common value is an integer we have only the two 


possibilities described above, while if not, then a, = c, and continue on {je same way. 


So far we’ve discussed how a rational number can be represented by a continued 


fraction. We now turn around and ask, given [do ἐν δ,.4] how should it be evaluated? 


Ι | 
Consider, for example, [5,4,3,6] =5+ — Ty: The natural inclination is to do it from 
4+ 7 
34 — 
6 
1 19 6 82 19 429 
the bott , I+>=—, then 44+— =— l —_—=——_. is 1 ; 
e bottom up ς΄ ς᾽ then +79 19 and finally 3+ 35 59 This is fine, but 


eventually continued fractions will become ‘infinitely long’ by allowing the number of 
terms to increase, and from this perspective it is better to evaluate from the top down. So 


5 1 21 | 68 
first calculate [5] = Τ᾽ [5.4] Ξ- 51} 4 4’ [5.4.3]-Ξ-5 T= 43> then finally, 


3 


225 


[5,4,3,6] = = . But how are all these related to each other and the final result? To 


b 
answer this, consider, for the moment 4,,0,,...,b,_, aS variables. Then [ὁ] = Τ᾽ 
1 5,6, ἘΠ 
[b, 5, ] = ὁ, a - If we set 4,=b), By=1, A, τ διὸ +1, B, =), then 
] Ι 
Ay J _ 4,615 +b, +b, _ 5,4, + Ao 
a a a ee τα, δά τας 
+— 
Ϊ b, 
which suggests defining 4, =b,A,+4,, B, =6,B,+B, so that [b,., 5, |= - This 


scheme may then be continued. Formally we define, given the sequence 5,,b,,6,,..., 
A,=1, B,=0, 4.=5h, B=! (3) 
and inductively for i 21 


A,=bA.,+A_,, B,=0),B_,+B,, (4) 


{7 1-Ἱ ἐπ 1- 
The 4_,, Β᾽ are ἃ useful convenience. They, along with B, , are constants while for 
50, A,, B, are polynomials in },,6,,...,b,, but B, is independent of b,. A suitable 
induction argument then shows that for all » > J 
A ] 
-Ζ πὶ ς 
[-Ξ (5) 


n-} 


[..6᾽....,Ὁ 


n-1 


This is a formal algebraic identity which then remains true if one substitutes for by any 
number and for the 5, i> 0, any positive numbers ; so B,_, #0. Here number may 


mean integer or real but we are interested only in integers. Returning to the original 


question as to how to evaluate the continued fraction [δ δ᾽ .....Ὁ ] the answer is to 


n-l 


A 
calculate A,,B,...,.A,.,,B,_, successively, and then the value is —*!. This is done best 


n-] 


with a table which we present for our example [5, 4, 3, 6]. 


(6) 


226 


First the two rows labeled / and 6 are filled in; i acts as column index. Then the rows 


labeled A and B are filled in left to right, according (3), (4). Then for any i >0, the value 


A. 
of [by 5d, “ is read off from the table as ΓΕ 


U 


2. Matrices and the Group G 


. A, A. Aj, 4, δ, I 
The definition (3), (4) can be written R = B 1 0 fori21. 


i-| 


Ay Acs\(by WN) 
If i 22, then the first matrix on the right is similarly [ Β ; B ἢ Γ ὴ and iterating 
i-2 i-3 


(4 4 (6 ἣν ) (6 ) | 
B, Ba) κι Oka oka of’ 7) 


We introduce the abbreviation 


this one obtains finally: 


Μίν. 5 8 [" 1 ἣ [Pm ἢ 3 
[δ.. δι. va] = 1 oli ofl 1. of: (8) 
Thus we have the following continued fraction - matrix connection: 
A, A, 
[ Β΄. ἀπ} = Μ[δι,8,....,.δ,... (9) 
Since the matrix [ ᾿ ᾽ has determinant -1 we immediately obtain 
A,B,» " 4,,78... = (-1)" ‘ (10) 


Ι 
Here is a nice immediate application of this matrix viewpoint. Since [ ᾿ } 15 


"-- B,,_, 


A,,_ 2] ~ M[o,.. ἘΣ δι], so the 


symmetric, transposing both sides of (9) yields [ 


n~] 


value of the continued fraction obtained by taking the terms in reverse order is 
n-2 
429 


Thus, noting (6), [6,3,4,5] = a 


227 


a ὃ 
Basic to all that follows is the set G of all 2x2 matrices M = [{ ἢ with a, 5, c, 


d integers and det M = ad — bc = +1, which ensures that M7' €G and Gis a group with 
respect to matrix multiplication. Note that any two entries in the same row or column of 
M must be relatively prime and if a 0 occurs the other entries in its row and column must 
be +1. 


A notation that will be useful is to define, given a rational x, CF,[x] is the 


continued fraction for x with an even number of terms and CF [x] is that with an odd 


429 
number. For example, from (6), CF, |e |. [5, 4,3 6] and CF Fe |= [5, 4,3,5 1]. If 


o=+1 we write CF, for CF, or CF. according as o=1 or-1. The following result is 


a lemma that is implicit in the literature but not in the form we need it. 


a ὃ 
᾿ Lemma Let μ-| "ἐσ, ἐν 450. Let o=det M, 


= [by 5B, 5.-- ye |. Then M= M6, ,0,.-- gees Dn | and this representation - as an M 


6, 


Ο 
.-. ! 


a 
on? 
matrix - 1S unique. 


Proof Let N = M[dy,y,...,6,.,].. We have to show M=N. By 


construction, (— 1)" =det N and (-1)" τ σ Ξ ἀεί M. Writing, as usual, 


A, A,» A, a . 
N= B we have = [b, δ᾽ seedy α] Ξτ: Both fractions are reduced and 
n-2 


n~-] 


c>0Q, by hypothesis, and B,, >0 follows from (3), (4). Thus 4,_, =a, B,_, =c. Note 


n-l 


. a . . . 
that 122 since c>d>0O makes c>1, — not an integer. det Μ = det N now implies 
C . 


ad -bc=aB,_,-cA,_ 2 OF a(d - Β,. ,)= c(b- A, ,). So c divides a(d- Β,.:}» (c,a) =], 


hence c divides d—B,_,. Now 1<d <c andalso 1< B,_, <B,,=c,s0 |d-B,_,|<c 
and being divisible by c this forces d= B,_, andthen b= A,,, hence M=N. For the 


228 


. . a A, 
uniqueness, if also M = MI co5¢, μα] then we would have ~ = ran , where A/_,, 
k-] 


B,_, are the quantities associated with cy,c,,...,c,_,- But this implies 


<= [5520 ) 5---58,-1]=[Coseis-see1] and (-1)" =det M =(-1)‘ so k =n(mod2). But - 


has only two continued fractions, one with an even number of terms and the other with an 


odd number, so k =n and c, = 56, for all i. 


Now we are interested in the set Ο΄ of all matrices in G that can be expressed as 
_ (δὶ , oe 
a product of matrices 1 0 where ὁ 1s a positive integer. Thus 


G*={MeG | M= M|[b, .6)....,8...], n21, b,,0,,...,5,_,, all positive integers}. (11) 


G” is a subset --not subgroup - of G and 15 clearly closed under multiplication: M, 


0 
¢G*. Our first theorem 


] 
N €G* implies MN €G"’. The identity matrix 1 { Ι 


gives another characterization of σ΄. 
a b 
Theorem M= ᾿ ‘| €G” if and only if 
azb2>d,azczd,dz20. (12) 


is 


n-J 


If MeG*, let c=detM, CF) =| = fb. -sboaid. Then M =M[b,,6,,.-.5 


| J 
the unique representation of M as a product of matrices of type ἢ ἢ , Z a positive 


integer. 


Proof Let, for the moment, G' be the set of matrices M satisfying the 


a. ὦ 
inequalities (12). Given M, = (° ῃ , i=1, 2, in Ο' set 


i i 


[ Ἢ MM Ν 12 Η [τε τϑιο ne) Clearly D>0 
C DJ) '? κω d\e, 4,2} \ca,+d,c, cb, +d,d,)” camy ee 


C 2 D 1s equivalent to oa, ~b, )+d,(c, ~d,)>0. Since c, 2d, 20 and a, 26,, 


c, 2d, it follows that C2 ἢ. In the same way one shows the other inequalities are 


B 
C ἢ εσ'. Thus M,M, εσ' and σ' is closed under 


satisfied to assure 


] 
, g21,isin σ΄ it follows, recalling the definition of σ΄, 


multiplication. Since [ξ 0 


that σ΄ «- σ΄. We now show the reverse inclusion along with the uniqueness. Assume 


a ὃ 
u-(‘ ἢ εσ'. If c>d>0 then αρϑὸο, -ς 1, not an integer, so M has the unique 
ς 


representation stated, via the lemma, and 4, = Ξ 2=1,so Μεσ΄. If c>d>0 fails, 


then, since M €G’', either 1=c>d=0 or l=c=d>0. Inthe former case we must 


a | 
have b=1, μ-ἰ ὴ with α 1. Then det Μ Ξ -Ἰ, cr|*]=fa SO 


a ὃ 
M =Mla] €G* and the uniqueness is clear. If c=d=1, M .(‘ ἢ ' 


det M=a-—-b=+1. But a2b then forces a=5+1, det M=1. Thus 


229 


b+1 δὴ (b (ΓΙ 1 b+! 
u-("' "- 5) =Mfoal and cr) *|=ce|2**)=[o,). ‘Ths again 


Μ ε΄ and the uniqueness is clear. This completes the proof of the theorem. 


[ " [ "J 
An example: . 


135 
7 43 22 135)᾽ 


7 | are in G but not G’ , while 


" 22 


135 
+ Si -- -- |. d 
43 * eG". Since det Μ 1 we calculate cF| 43 | [3,7,6] and so 


1 OX! OI 0 43 36 


Iculat cr) τ = [3,7,5,1] and v-(? Ve N(; ie ἢ 
calculate CP.) 73 |= 13.751] andso N=!) oils olla oly ol: 


3 1){(7 1\f6 1 135 113 
M= . On the other hand N = eG’, det N=1,we 


230 


3. Infinite Continued Fractions 
We’ ve seen that the process for obtaining the continued fraction of a rational 


number can be expressed in terms of the greatest integer function. As such, it can also be 


applied to an irrational number x. Set x» =x, b) = | x, |. Then 5, <x, <5, +1 so setting 


l 
x, = > 1 we have Χο = by +. Again set b, =|x,], x, = > 1 and 


Xo ~% x, —, 


continuing this way we generate a sequence x,,x,,X,... of real numbers and integers 


Ϊ 
b, ,b,,5,,... such that foreach ἱ δ 1, x,_, =5,, +73 ἃ, >1 and 


I 


Xp = By τ---------------- [bg] (13) 


δ, + 


b, + 1 
δι. τ 


Since x, is irrational so is each x, and the subsequent x;,,, = is always well- 
x 


— ῥ. 


i i 

defined. (13) expresses x, as a finite continued fraction but the last term x, is not an 
integer. Nevertheless, as noted earlier, the formalism of (3), (4) continues and one has for 
all 721 


x,A,_, + A,_, 


x) = [δ..8.....,8....χ}]} ~ xB, +B,, 


(14) 


where the A,,, A_,, B,,, B,., depend only on the integers 6,,5,,...,5,_,. 
We now define an infinite continued fraction to be the formal infinite expression 


1 
by + 


i= [6.0 ,b, " where 5, is any integer and 5,,b,,... an infinite sequence 
b 


+ 
1 ΝΝ 
b+. 


of positive integers. Thus we have now associated with every irrational x an infinite 


continued fraction, denoted CF[x]. Consider a numerical example. Take x, = x =-V3. 


| 


l 
Then 6, =-2, x, = ΠΣ 5. Π᾿ At this point it is best to proceed with elementary 
0 0 ~ 


231 


1 243 


algebra and rationalize the denominator: x, = 5. ἢ ΝΕ = /3+2, b, = 3. Leaving 


34] 
M341 κοι αι = JF41,b,=2, 


the simple steps to the reader we then have x, = 


V3 41 | 


7? δ, =1. But wait- x,=x,, δ, =5, so then x, =x,, ὃς =5, and everything 


repeats after two steps. Doing it directly on a calculator, without rationalizing 


denominators, I find x, = 3.732050807, x, = 1.366025404, x, =2.732050806, 


xX, = 


x, = 1366025407, so x, #x,! So we’ve found CF{- ν3] = [-2,3,1,2.1,2,...] = 
[- 2,3,1,2| . where the bar over the block 1,2 indicates its infinite repetition. In general, a 


continued fraction is called periodic if it has the form ley.c, yee Cp Oy δὴ... Dy, |. The 
block b,,b,,...,5,_, is called the period and k 21 is the length of the period. c,,c,,...5€)_, 


is the pre-period and may be absent, in which case the continued fraction is called purely 
periodic. Note that for CF[x] = [δ..Ὁ......δ..4]} purely periodic, the 5,, x, have period k: 


δι., =5,, X;,, =x, forall i20and ὁ, Ξ ὁ 


͵ 


,» X; =x, whenever ἱΞ j(modk). It is tacitly 


assumed that the period is minimal, does not consist of a repeated shorter block, and 
starts ‘as soon as possible.’ Thus, though CF|- V3] can be written [-2,3.1,2,1,2,1] the 
period is 1,2 and the pre-period -2,3. Because of periodicity we can describe all the 
terms. Euler found, for e the base of the natural logarithm, 

CF[e]= (2,1,2,1,1,4,1,1,6,1,1,8,1...] and though not periodic there is a pattern after the initial 
b, =2 and the terms are ‘known.’ On the other hand CF[z] = [3,7,15,1,292,1,1,1,2...] and 


no one has found a pattern to predict the terms. Apparently CF[x] in not known in its 
entirety for any algebraic number of degree >2. | 

So far infinite continued fractions are only formal objects - like infinite series 
before one has defined the notions of convergence and sum. Considering again (14) 


A A,_,Xx,+A, A, +1 
along with (10) we have x, -— = “++ + B.. BAB +B.) Thus 


232 


A, l I 
Xy - | = — F< 7 Since x, >1 for i21. But B) =15 Β, «8, «-- 
Β,. B_,(B,.x, + 8.3) δ... 
. A cs ρον . ι, 
shows B,, > ©, lim =x,. Thus the ‘partial’ finite continued fractions 
i900 i-] 
A; . . . . . . 
[5,.5, 11-58) = converge to a limit, the limit being x,. Moreover it can be shown 


i-| 
that given any sequence of integers },,5,,... with b, positive for i 21, then the finite 
continued fractions [5,.2, ἐν] tend to a limit as i > οὐ, which is then defined to be 


the value of the infinite continued fraction. In fact, the function that assigns to each — 
infinite continued fraction its value is a bijection onto the set of all irrational numbers; if 
. - 4, 
the value of [5,5 " isx then ΟΕΪχ] = [δ..Ὁ᾽ yl. The rationals Bote called the 
i-] 
convergents of x, and give ‘best’ rational approximations; their study is the topic of 


diophantine approximation. 
It was not just luck that CF [- V3 turned out to be periodic. The classical 


theorem of Lagrange asserts that CF[x] is periodic if and only if x is a quadratic 
irrational - the root of a quadratic equation with rational coefficients. Furthermore, 
Galois showed that CF[x] is purely periodic if and only if x is reduced. Reduced means 


x>1 and —1<x'<0, where x’ is the conjugate of x, the other root of the equation 


V3 +1 
2 


satisfied by x. For example, from CF|- J3| we had x, = =[1,2] and x, >1, 


~J3+4+1 
--.----- < 


0. 
2 


-l<x, 
Let / denote the set of irrational numbers, /, the quadratic irrationals and R the 
reduced ones. The x, generated by CF[x] are called the ‘complete quotients’ of x. The 


construction shows that if CF[x] = [b,.b,.5,,--.| then CF[x;, | = [0,0 b,,,.-.], which 


415} 


α ὃ 
might be called ἃ ‘tail’ of CF[x]. Recall the group G; M/ = [ ἢ ΕΟ actson/ by 


233 


+b 
sending x into M(x) = τ. The map Gx /— / that sends (M,x) to M(x) isa 
cx 


standard group action. Under it / breaks up into G-orbits or equivalence classes; for 
x,yel wesay y~x if y= M(x) forsome Μ €G. Thus (14) shows x, ~ x and Serret 
proved the beautiful result that describes the G-equivalence class of x in terms of 
continued fractions: y~ x if and only if for some integers 7,720, y, = x,, thatis y, x 
have identical ‘tails’ from some point on. | 

From now on we shall be concerned only with quadratic irrationals and our goal 
is to show how the periodic structure of CF|x] reflects arithmetic properties. Numbers 
x Εἰ, are classified by their discriminant. This word requires careful definition since it 
is used in different ways in various contexts. x is the root of a quadratic equation with 
rational coefficients. By multiplying, dividing the equation by integers one can arrange it 
to be in the form aX’ + bX +c =0 witha, ὁ, c integers, a>0 and gcd(a,b,c)=1;_X is 
simply an indeterminate. This equation is now uniquely determined and we call it the 
standard equation for x. Then d = δ —4ac is called the discriminant of x, denoted 
d=disc(x). For example, with x = ~J3 , the standard eq jation is X? —-3=0 and d=12. 
Note that any integer d arriving as disc(x) is positive, since x is real, not a (perfect) 
square, since x is irrational and d = δ΄ =0 or 1(mod 4). We call any positive integer d 
not a square and = 0 or 1(mod 4) a discriminant. In fact, d=disc(x) where x is a root of 


d α--Ἰ 
xX? -(2)=0,if d= 0(mod4), and a root of x? x-(S4)-0 if d= moda). 


The first few discriminants are 5, 8, 12, 13, 17, 20, 21. Note, for later use, that if the 


discriminant αἱ = s7D,s, D integers and D=0 or 1(mod 4), then D is also a discriminant. 


1 01 
Consider y = -Ξ [ ω . If aX’? +bX +c =0 is the standard equation for x 


and d=disc(x), then + (cX 7+ bX + a) = 0 is the equation for y, so disc(y)=d. if 


1 —m 
Z=X-M= ᾿ ' Ἰω , man integer, then x = z+ m satisfies 


234 


a(z+m)° +b(z+m)+c=0 from which one deduces that the standard equation for z is 


AX* + BY +C=0 where A=a, B=2am+b, C=am2+bm+c whence 


-1 0 
disc(z)= B’? —4AC = b* —4ac=d. Easier to see is that — x = [ 0 ‘Joo has disc(-x)=d. 


1 δ᾽χ0 | b | 
Since [ 0 Ἴ ὴ = [ ὴ and the results of Section 2 enable one to show that any 


δ 1\)f-1 0)/0 1 
Με may be expressed as a product of matrices of the type [ " 0 ἢ ὴ it 


follows that if y= M(x), M eG, then disc(y)= disc(x). Thus disc(x) is an invariant of 


] 
the G-equivalence class of x. We also see directly from x = x, =}, + τ’ that disc(x)= 
1 


disc( x, ) and in general all the complete quotients x, of x have the same discriminant. 
Thus we’ve shown that for x,y €/,, y~x implies disc(y)= disc(x). However, 
the converse is false, as we shall see. Given a discriminant d, let 1, (d) be the set of 
x Εἰ, having disc(x)=d and P(d) the set of reduced x having disc(x)=d. By what we 
have just shown, /,(d) is a union of G-orbits. By Lagrange’s theorem every x € J, has 
some x, which is purely periodic, hence in R by Galois, thus every x e/, (d) is G- 
equivalent to some y € R(d). In other words every G-orbit in /,(d) meets R(d). The 
number of G-orbits in /,(d) is called the class number of d, denoted ἢ = h(d) and we 
will show μία) is finite by showing Δία) is finite. But more than showing R(d) finite, 
we want to give an easy algorithm for listing the members of R(d). 


Theorem Let d be a discriminant. The set R(d) is finite and its members 


may be enumerated as follows. 
Let g be that one of | Va | , | Vd | —1 that is =d (mod 2). For 
P= 2,2 —2,g—4,... and >0 (so the last p is 2 or 1 according as d is even or odd) 


determine all ordered triples ( ».4,4}} of positive integers satisfying 


4, 4 are even, d= p?+qq', g-p+2<q<g+p and ged(tq,p,tq°)=1. (15) 


235 


d 
vd +p . Then x ε R(d) and in this way 


Assign to (p,9,9') the number x = 


each member of R(d) is listed exactly once. By this arrangement the standard equation 


for x 15 (4q)x? — pX ~(+q') =). 


Proof Suppose x ε R(d) and has standard equation aX’ + bX.+c=0. 
-διενά --ὃ- νά 
By the quadratic formula the roots are Sq. og and the former is the larger 


root - by our standardization a>0. But x reduced says x>1, -1<x' «0 so x is the larger 


~b+Vd 
2a 


=x+x'>1-1=0 so b<0. 


Ὁ oe 


ς 
root, x = . Also 0>xx' =— so c<0 and -- 
a 


Setting p=—-b, g=2a, α΄ =—2c we have p, 4,4 positive integers, g,g° even and 


+ | 
P (we write the Jd first to emphasize the 


d= p’+qq. With this notation x = 


role of the discriminant) and the standard equation is (:4)Χ > — pX - (4q°) ΞΟ. Wenow 


revere direction and ask, given p, gq" positive integers, with 4,4 ἢ even, d= p*+qq 


Ja +p 
q 


and gcd(+q, p.tq') = 1 will x = ε R(d)? Note that x is indeed a root of the 


standard equation (4q)Xx 7 — pX - (: 4} = Ο and has discriminant d, so we only need to 


«0 is 


_ Jd + 
check if x is reduced. Now x>1 is equivalent to g < Vd + p,x'= awerP 
q 


equivalent to p< Vd and χ' " --ἰ is equivalent to Vd -p <q. Combining these we see 
that 

0<Vd-p<q<vVd+p (16) 
must be satisfied. We now want to eliminate the Vd. Since d= p’ +qq",4,q° even, we 
have p=d(mod 2) and 0< p< Vd , so that pis Ξ the largest integer that is < Vd and 
= d(mod 2), which is precisely g as defined in the theorem. Thus the possible p are as 


described. By definition of g, Jd = g+2 with 0 «9 «2 and (16) is equivalent to 


236 


g-p+&<g<gtpt+. Noting that g-p, g, g+p are all even this is equivalent to 


g-pt2<qsgtp. 


Our description shows that the conditions of (15) are precisely what is needed to 


guarantee x = ε R(d). Since there are only finitely many values of p and for 


each p only finitely many values of g,g” this shows R(d) finite and the proof is 


complete. 
Note that gq’ =d—p’ = (Ja - p\(Ja + p) along with (16) shows that also 


Vd - p<q< Vd + p so our conditions are symmetric with respect to g,g". Along with 


Jad +p 


q 


+ p 1 , J 
ς Ξπσ-,;, € R(d). If we set x =—7;> then 


x= in R(d) we also have 


χ--» x’ is aninvolution of R(d). 
Note from (15) that g=2 can occur only when g=p. Since d= ρ' +2q° 


determines the even integer q* uniquely and the gcd condition is obviously satisfied, we 


+8 


see that , which we denote as z or z(d) is the unique member of R(d) with q=2. 


Suppose in listing members of R(d) according to the theorem one has p,q,q" satisfying 
all conditions of (15) except gcd(+q,p,$q') =s>1. Then g=sQ, p=sP,q =sQ’, 
with Q, QO“ even and D= P? +QQ", d=s’D so Disa discriminant and P, Q, Q* satisfy 


(15) for D. The inequalities in (15) are not so obvious though, but the proof showed they 


are equivalent to (16) and the terms there are homogeneous in 5, so we obtain 


weeP VD+P 
ΠΟ 


cannot be written as 57D, s>1, D=0 or 1(mod 4) the gcd condition of (15) needn’t be 


0</D-P< O< JD+P. Thus ε ΚΑ). As aconsequence, if d 


checked since it always holds. Such d are called fundamental (or field) discriminants; the 


first few are 5, 8, 12, 13, 17,21. 


It’s worth remarking that when presented with an x ε],, disc(x) is often not 


~J7 +2 


7+2 
obvious. For example, x = ae >1 and ~1<x'= 7 — <0, so x €R. What is 


disc(x)? Find the standard equation: (4x - 2) =7, 16x’ --Ιόχ--3-.Ὸ, d = 448. 


V448 +16 


x= 35 ε R(448). From now on we always write a reduced x in the form 
d+ 
described in the theorem, x = ah P , a=disc(x), and call this the standard form of x. 


For da discriminant we denote |R(d)|, the number of members of R(d), by r(d). Since 


z(d) ε R(d), r(d) >1. We consider some numerical examples. 


d=37 is a prime, so the gcd condition need not be checked. | V37 | = 6, g=5 and 


the range of p is p=5, 3, 1. 


p=5 Solve gq’ = 37-5? =12 with gq” even and 2<q<10. There are two 


solutions, 12=2*6=6*2. 


p=3 Solve gq’ = 37-3? =28 withg,g* even and 4<q <8. There are no 


solutions. 


ΡΞ] Solve qq’ =37-1° =36 with 4,4 even and 6<q<6. There is only one 


solution, 36 =6*6. Thus r(37)=3, 


37 +5 37 +5 37 +] . . 
Ron) = {z= SERS, = SEAS, BT ᾿ Note that x=z and y=y. 


d=72 (not a fundamental discriminant) | ν72 | =8, g=8, p=8, 6, 4, 2. 


p=8 gq’ =72-8 =8, 2<q<16, has solutions, 8=2*4=4*2, 


gcd(+(2),8,4(4)) =1. 


237 


238 


p=6 qq =72-6 = 36, 4<q< 14, has one solution, 36 = 6* 6, but 


gcd(+(6),6,4(6) = 3; reject. 
p=4 qq = 72-4 =56,6< g <12, no solutions. 


p=2 qq’ =72 -2' = 68, 8<q<10,no solutions. Thus r(72) =2, 


V72 +8 BS 


x = 


R(72) = f = 5 4 


The function r(d) is extremely erratic. The reader may wish to calculate r(281) 
and r(293). For larger d one has r(1109) = 11, 7(1129) = 51, r(1181) =9, r(1201) = 53; 
all these discriminants are prime. These results were not obtained by hand but by a 
computer program implementing the algorithm of the theorem. I want to thank my 


Lehman College student Mr. E. Moss for his assistance with the computer work. 


If x eR(d), CF x] =[b..b,,--b,. is purely periodic, x, =x, x,,...x-_, are all 


~x and in R(d). Suppose also γε R(d), CF(y] = [eo 56, On| If y ~x, by Serret’s 


theorem y, =x,,forsome 7,15 0. But then for any n20, y,,, = x,,, 80 choose ἢ so 


ΓΗ 


that 7+n=sm,a multiple οἵ m. By periodicity γ.., = y, =y,so0 y=x,,,, and reducing 


it+nmodk,say i+n= &(modk), 0< £<k-1, x, 


i+) 


=x,,or y=x, is one of the k 
distinct complete quotients of x. Thus from CF[x] = Cae we deduce that the 
G-orbit of x meets R(d) in the set {x =X,X, se Xy1} . We refer to this as the cycle of x, 
since it comes with a specific ordering determined by CF [x]. If one had started with 
some x,, O<i<k-l, CF[x, | = [δ, δ᾽, ιντττεδι sda sO. and the cycle of x, is 

1X) χρὶ geeegXpipoXq = XyeeeeX_y } . Thus the ordering is determined uniquely up to a cyclic 


permutation. If one then partitions R(d) into cycles, the number of cycles is precisely 


| μία). the class number of d, defined earlier as the number of G-orbits of J 5 (d). 


239 


For example, starting with z =z, = obtained above one finds 


J374+5 
2 


CF{z, | = [5.1.1] , Z, =X, Z, = y, in the notation of that example, {z,x,y} is the only 


cycle and h(37) = 1. Observe that the period 5, 1, 1 after the first term is a palindrome, 1, 


1, which reads the same backwards as forwards. This phenomenon always occurs for 


V37 +1 
6 9 


CF|z] for any d. CF [ y] = [1.5.1] has the whole period as a palindrome, and y= 


37 =1° +6’. Again this will always happen when g =q', d= p’+q’. In the literature 
these observations are usually made in connection with the study of CF [vn ,nnota 


square. We have examined these special symmetries in a recent paper [3]. 


The function h(d) is also quite erratic. Something is known concerning its 
divisibility by powers of 2 from Gauss’ genus theory of binary quadratic forms. Gauss 
also conjectured that Π(α) = 1 occurs infinitely often and this still remains an open 


question. For the d mentioned earlier, #(1109) =1, 4(1129) =9, h(1181) =1, 
h(1201) = 1. 


4. Quadratic Fields Significant aspects of purely periodic continued fractions 
can only be understood in the context of algebraic number theory. Actually, all that is 
needed are some basic notions of that theory as they pertain to real quadratic fields. We 


shall take these for granted, and recall a few points to fix the ideas. 


Let Καὶ be a real quadratic field, O, the ring of algebraic integers in K. If x,yeK 
are linearly independent over the rationals they form a basis for K/Q; every 2 ε K hasa 
unique representation as A = rx + sy, r,s EQ. If we restrict ourselves to those A for 
which r,s €Z the resulting set L is a subgroup of the additive group of K and 1s a free 
abelian group of rank 2. We write L = (x, y) to indicate the construction of L via the 
basis x,y for L/Z. L is called a (full) module in K. An order in X is a full module that is 


also a subring of K and contains the ring Z. (This has nothing to do with the notion of 


order in the sense of ‘less than’ or ‘greater than.’) If 15. a full module the set 


240 


{Ee K|ELCL} is called the coefficient ring L. Here EL is the set of all elements EA, A 


ranging over L. The coefficient ring of Z is in fact an order. 


Every K is obtained uniquely (up to isomorphism) as K = o( Vm) where m is an 


integer>1 and square free. Set 


1+ Jm , 1 m= l(mod 4) (17) 
ω -Ξ 2 
vm , if m=2,3(mod 4) 


Then 1, ὦ isa basis for K/Q and O, = (1,ω). Every order O is contained in 


O, and is uniquely determined by its index f = [ο,:ο] and then O = (1, fo). For 
x €K, irrational, x'#x and x' εκ. Defining x'=x for x EQ, the map x > x’ is the 


non-trivial automorphism of K/@Q; it and the identity map constitute the Galois Group of 


2 
x 


‘ 


= (χν' - x'y) , where x, 


K/Q. The discriminant of O, disc(Q) is defined to be 


x! 
y is a basis for O. It is independent of the choice of basis. Using the basis 1, fo one 


obtains disc(O)= {?(w -—w')’. By (17) then 


fim , if m= 1(mod 4) (18) 
disc(O) = 
4 fm , if m=2,3(mod 4). 
Thus disc(O)=d, a positive integer, not a perfect square; m is the square free part 
of d. For f=1, O= O,, dis m or 4m, according as m=1(mod 4) or not. This also is 
called the discriminant of K, denoted d,. The numbers d, are the fundamental or field 
discriminants mentioned earlier. 
It should now be clear that if we allow O to range over all orders of all quadratic 


fields the resulting set of integers {disc(O)} is precisely the set of positive integers not 


perfect squares that are = 0 or 1(mod 4), i.e. the set of numbers called ‘discriminants’ in 


the previous section arising as disc(x)= δ᾽ —4ac. Also, given d the numbers m, fare 


241 


uniquely recoverable, showing that there is a unique O with disc(O)=d. We denote by 
O, the order with discriminant d. For example, what is Oj 099 ? 


Write 12000 = 2‘ -5’ -2-3-5, so m=30 is the square free part. 30=2(mod 4) so 
write 12000 = 4(2-5)? 30 showing 7510. Thus Oj299 = (1,10V30) contained in 


(730). 


Since O is determined by its discriminant (we find) it is preferable to express the 
elements of O, ‘canonically’ in terms of d, rather than use the somewhat arbitrary basis 


1, fo. We claim 


r+ svd 
λ € QO, if and only if A= sv , r,s€Z and r=sd(mod 2) (19) 


Note the congruence condition, so this is not a basis representation. 


To prove the claim, suppose m =1(mod 4). Then d = f’m= 7} = f (mod 2) and 


+ fal +vVJd 21+sf)+sVda 
fo = ee Ξε asa Hence A €O, λείπ = Gee τον with 456 Z, 
r+svd_ | 
oriff A= a with r,seZ and r =sf=sd(mod 2). The cases m =2,3(mod 4) are 


+sVd 


proved similarly. Note that with A = — Ee O, then NA, the norm of A, 


a2 — ds? 
and this is an integer. Also we see directly that A €O, iff 4’ €O,. 


=AA' = 


The units in X are the algebraic integers 7 for which — is also an algebraic 
τ 


integer. Thus the units are the invertible elements in O, and forma group U,,a 
subgroup of Κ΄, the multiplicative group of non-zero elements of K. τε Oy is a unit iff 


Ne rt'=+1 which is iff its standard equation has the form X? +bX¥ +1=0. 
᾿ . Ϊ . . ᾿ 
If O,is an order in K and τεῦ, NU, then τ᾿ +t’ shows that ris an invertible 


element of O,. Thus it follows that O, OU, is the group of invertible elements in O,, 


denoted U,,, called the unit group for discriminant d. By (19) we have that 


242 


r+sVd 
2 


obvious that this equation has non-trivial solutions, 1.e. other than r = +2, s=0, so not at 


T= εὖ, iff r,s satisfy the Pell equation γ΄ —ds* = +4. Nowit is not at all 


all obvious that there are any units τ other than the trivial ones +1, which are the units 
for the rational field Q with the ring of integers Z. What we want to show here is that the 
existence of non-trivial units follows as a direct consequence of results about continued 
fractions, and the two theories shed light on each other. 
Let x εκ be irrational, Z the full module (x,1). x, 1 is a basis for K/O so 
multiplication by A ε Κ is represented by 
A-x=axr+B-l=axr+fB (20) 
K=A-1=yx+8-1=yx+8 


| α 
with a, B, y, 6 rational. As long as 4 #0, the matrix M = [ 4 is invertible; 


Me GL(2,Q). We denote M = p(A) and then the map p: Κ΄ > GL(2,Q) iS an injective 


homomorphism - the proof makes use of the commutativity of K*. Recall that 


NA=detp(A). p really should be denoted p, , to show its dependence on the basis x,}. 


Let aX’? +bX +c=0 be the standard equation for x and d=disc(x). Let O be the 
coefficient ring of L; by(20), A €O iff a,B,y,6 eZ. Thus for %X €O, A= yx+5 and 


(yx +8)x =ax+B, or yx? +(8-a)x-B=0. Comparing with the standard equation this 
forces y = na for some integer n, A = n(ax) δ ε (ax,1) . Conversely, one sees that 


A ε(αχ,1) implies A εο. Thus O = (ax,1) and 


1 +Jd)’ 
=a’(x-x') = 0 4] =d. This proves the crucial connection 
a 


ax 
disc(O)= 'ν ᾿ 


if disc(x)=d then O, is the coefficient ring of (x,1). (21) 


B 


ὴ e GL(2,Q) having a,B,y,6 ΕΖ. Fora 


a 
Let £2 be the set of matrices M = [ 


non-zero integer 7 let Q,={ M € Q| det(M)=n }; so our group G ξξῶ, υῶ.,. Suppose 


243 


α 
now Δ €O, 85 above, NA=n and p(A) = M = ° ἣ . Then Me Ω, and 
Ax + 
x= = ae = M(x);.x 1s fixed under the action of 4 as a Mobius transformation. 


ς Ι μ-[5 Pen dx = M0) - te set aye +8, then 
= = =". = VX 3 Θ 

onversely, suppose y 8 e€Q, and x (x γχ τὃ e ¥. 

Ax =(yx+8)x =ax+fB so’ €O, M = (A). If we make the definition ΛΔ, Ξίλεο! | 

Nd=n}, and for any set of matrices Sc GL(2,Q), S,={MeS | M(x)=x} then we have 


shown that p maps A,, one-one onto ,,,. In particular with U =U, =A,UA_,, p 
maps U bijectively onto 2, ,UQ_,, =G,. Since p is also a homomorphism we have 


that p:U — G, is an isomorphism. The inverse isomorphism we denote by 9:G, > U; 


β 


| EG,, p(M) = yx+5 eU. Thus if the group structure of G, is 


α 
for μ-ἶ 
γ 


determined then so is that of U. 


So far the only property of x that was used 1s that disc(x)=d, so we are free to 


choose any x subject to this condition. We choose x ε R(d), CF[x]= [δ..δ.....0(δ 0 . 
Ai, 4, 
For i21 set M,_, = Μ[δ..8,....,8...Ξ 3 5} By (4), x= Μ, αὐ). With i, 
i-T i-2 


set M = M,_,= Μ[δ..6᾽....»δ... and x, =x, =x wehave x= M(x), ΜΈεσ,. Since 
M €G’, it has infinite order and this proves the existence of non-trivial units. Let 


T={+]}x {M"} = {+ Μη) . Clearly [ ¢ G, and we now show that actually Γ = G,. 


Suppose V €G,, V #+/. Let pV) Ξ- #1. Then φίε ΚΝ) =+&, o(+V-')=+= and one 


u 
ξ 


of these is>1. Let W be that one of ἘΜ, +V7' for which o(W) > 1. It will suffice now 


α 
to show W eI. Suppose wv =(° μὲ n=o(W)=yx+5, n>1 and neU. Thus 


l 
η Ξ Ἐπ’ In'| <1, then 0 «η-- η' Ξ γίχ -- χ') shows y >0 (recall x here is reduced). 


244 


Also 6 = n'-yx' = η΄ + lyx'| >n' >-1 shows 620. Also —1<-x' implies 
—y+d<yx'+d=n'<I1, hence -y+6<0,6<y. Altogether then 0<6<y. 
We now consider various cases. 


If 0<5<y then by the lemma of section 2, setting o =detW, 
cr £| = [cosCis- slvr} we have W = M[¢5¢, ὍΝ . Again using (14), the value of 
Y 


the continued fraction [eo5¢, yee Cy] ,x| =W(x) and W(x)=x, since W eG,. Thus 


x= [eos hse Cy1sX] which says [δ..8.....,8..4]} = [eos Cys-+-3Cy-4 1B, δ᾽ 5-+-584..}. Since the 
continued fraction of an irrational is unique this forces c,,c,,...,¢,_, to be a j-fold 


repetition of the period block 5,,0,,...,5,_,, forsome 751. Thus W= Μ΄ ef. 
Remaining are the cases 0=5 <y =1 and 0<6=y=1. Rather than consider 


these directly it is easier to consider separately the two cases (a) Nn = —1(b) Nn=1. For 


] 
(a) Nn=-1, η΄ Ξ "" <Q. Our previous analysis showed -- δ « η΄ hence now 


α 
-y+65<0,6<y so we must have 6=0, y =1, w-(° ἡ But n= 9(W), 


α 


Ϊ ] 
, x =W(x)=a+-—, hence 
1 O x 


ρ(η) =, detW = Nn =-1 so B=1. Thus wv -( 
a=Lx], CF[x] - [α]. Hence k=1,b,=a, ΞΜ. 


l 
For (b), n' = 7 >0. We saw previously that 6 > η΄, hence now ὃ» 0, 80 we 
a β 
must have 6=y =1, w-(° ὴ But detW = Nn =1 then implies a-B=1, 


] 
w-(P* ἢ ω v(x) τρις δ = BH, Thus B =[|x| and 


CF{x] =[B.1,8,1,...]. If B>1 then CF[x]=[B,1], so k=2; δ, =B, ὁ, Ξ- 1, 


245 


μ-μ5.]- {ἢ ae =P ον if B=1, CF[x]=[i], κ τὶ, & =1, 


1 1 2 1 
M=M([I\J { ἢ and W = ἔ ἢ = M’. This completes the proof that G, =I’. 


Now U = (G,) = {+ εὐ ον where ¢=@(M)>1. € clearly is the smallest unit >1 
and is called the fundamental unit for the discriminant d. We now write using (19), and 


t+tuvd 
2 


keep this notation throughout, ε = ,t,ueZ, t=ud(mod?2). 


To summarize: to find ¢ choose any x ε R(d) - there is always at least 


Vd+g 


z= at hand - determine CF{x] =[4),4,,....5,.,| and set M = M[4,,5,,....,.,| 


2 
V¥37+5 


A,., A,,) . 
- [ {τ ᾿ ; . Then e=9(M)= B, ,x+B,_,. For example, with d=37, z =—.——_, 
By, Bis. 2 


ΒΝ 11. 6 
we had (in the previous section) CF[z] = [5.1.1] . Thus M = Ι 5 ὴ ᾿ 


124+ 2V37 
2 


equivalent to M =p(e) which means that once ¢ is known, M can be calculated as the 


€=2z7+1l= , =12, u=2. Returning to the general case, = φί Μ) is 


+ ?P 


matrix representing multiplication by ¢ on the basis x,1. Write x = in standard 


form. Then straightforward computation shows 


provided one recalls d = p? +qq°. Thus 


t+pu 1, 
Ay fe) 2 “1 
-- Μ - ρ(ε) = (22) 
“ δ...) ι du f— pu 
2 


246 


gives two independent presentations of M; the one on the left depends only on 
CF[x] = [B55 3D] and the one on the right depends on knowing €. This gives a new 
way of calculating CF[x], purely rationally, once ¢ is known. Namely, since M €G*, 


by our theorem of section 2 it has a unique representation as M[d, 4, sesDy a | found by 


1+ pu)/2 


computing αἱ 75 where o = det M = Ne. Thus the period of CF|x] is found 
qu 


by computing the continued fraction of a rational number. We state this as a theorem. 


ttuvd | ss vd+p 


Theorem If c= 5 is known, then for x = 


ε R(d), CF[x] may 


(t+ " 2 
be found as follows. Let σ = Ne = +1, and compute CF, re wee ie [..Ὁ᾽ 5. Dy a]. 
qu 


Then CF[x] =[6,.5,,-.-58-]- 


Note from (22) and (10) that (- 1)“ = det Mf = Ne so that the parity of k, the — 
length of th-. period of CF[x], depends only on d, not the specific x eR(d). kis odd for 
all x ε R(d) iff Ne =—1 and is even for all x ε (4) iff Ne =1. 


¥1009 +31 


Here is a numerical example. For d=1009, z= > has, by usual method 


of computation, CF[z] = [3 1.2.1,.,,,,,2] , k=7. Then by the method described in the table 


A, 4) ve ἊΝ 1080-. 34/1009 
(6) one finds u-(o BJ \ 34 13}? 8° €=347+13= 5 , =1080, 
_ N1009 +25 9425 . 
u=34, Ne=-1. Consider now x a ε R(1009) , p=25, g=12, g’=32. Now, by 


9 
the theorem, we calculate CF. 50 | (4, 1,2,1,2,2,3,1 1] and so 


CF{x] = [4...2.1,2.2.3.1]. 


. . ες n+ . 
For certain d, ε is knowna priori. Suppose d =n’? Ἐ4. n>1. Then isa 
p pp 


unitin O, and is >1. It must be the fundamental unit ¢, for otherwise it is εὖ for some 


+ Jd 


n 
jel and then the coefficient of Vd would be greater than 1. Thus ¢ = , (=n, u=] 


Ι 
and Ne =-1. Note that in this case ζία) = and since εὖ -- με --ῖ Ξ0, ὃ Ξ nto shows 


CF[z] = [x]. This is, in fact, the only time the period has length 1. By the theoretn we 


Jd + n+ /2 
have that for x = P e R(d), CF[x] is found from CF. ieee)? . For example 
V904 +26 28 __ 
904 = 30° +4, x=——~——— is reduced and CF. = =[9,2,1] so CF[x] -- [9,24]. 
Other cases where € is immediately known are: 
2n+2Vd 
d=n’+1l,neven >4, p= nto Ne =-l, 


«νά 
4-"}-4 η"Σ4.ε- ad Ne =1, 


Qn+2Vd 
2n+2vd νςς 


5 Ι. 


d=n’?-1,nodd Σ 5, ε-Ξ 


It is interesting to note that our determination of G, gives a new criterion for 


x €R: Anirrational number x is a reduced quadratic irrational if and only if x is positive 


and S(x) =x for some matrix S εσ΄. 


We have already seen that the condition 15 necessary. On the other hand, suppose 


S(x) =x, S¢G*. Then by the Theorem of Section 2, S = M[d, 6, es Oyer | fora 
uniquely determined sequence of positive integers. There is a unique, largest integer 


j 21 such that 5,,5,,...,6,., is aj-fold repetition of the initial k terms },,6,,...,0,_;; 
n= jk. Let y= [b, 58,5 P, | Then, y eR and G, is generated by 


M= Μ{6,.6)....,δ..4.ὄ But S= ΜΙ’ 50 5 fixes y and γ'. Since S has only two fixed 


points x= y or y’. Butx>0,so x = y, x 1s reduced. 


247 


248 


5.Gmod2 Let F =Zmod2, the field with two elements 0,1. We use the 
usual integer symbols, so one should note, from the context, when the arithmetic 1s being 
done mod 2. In this section all congruences a = ὃ are to be understood mod 2 unless 


indicated otherwise. The map n> nmod2 of Z onto F induces a homomorphism 


᾿ [amon b mod 2 
4)) \cmod2 dmod2 


a 
W:G> ἢ } . For M ε6 we denote Ψ(ΑΜῚ) by 


M , and GL(2, F ) by G. One easily sees that G isa group of order 6 its elements being 


(2°) ne identity ο u-() ἢν. [ἡ ας ἢ μὰ -[1 ἢ 
l= 0 > the identity element, =l1 9)’ Ξί. 4)» *=U ὁ). ¥ Ξίο 1)’ 


0 
urR-( ). U,U’ have order 3, R,UR,U’R have order 2 and UR = RU’. Thus 


Ϊ 
Gz S,, the symmetric group of degree 3. U,U’ correspond to 3-cycles and R,UR,U’R 


to transpositions. For M eG, S eG wealso say M=S rather than M=S. For 


7 5 21 8 
example [7 Ἴευ, [" er. Σ, Ξ- 1}. Σ, ={U,U7}, 2, = {R,UR,U? R} are 


the conjugacy classes of G. 


ἐπ να 
2 


Theorem Let d be a discriminant, ¢ = , ΓΞ μά, the fundamental unit. Let 


x ε Κ(α), CF[x]= [δι 2B) 5-941], M= M[d, .2, peda Then the conjugacy class of 
Μεσ depends only on d, as follows: 

(a) d=1(mod8) implies Μ -- 1. 

(b) d =5(mod 8) implies Μ--1 ifuis even, and Me x, if wis odd. 


(c) deven implies Μ- ifuis even, and Me x, if wis odd. 


id t+pu 1 

d+ > 54 

Proof As usual, write x = P and M = i? 7 pu| 8 in (22). If wis 
2 2 


even, then both off-diagonal elements are 0 mod 2 and the only matrix in G with this 


249 


property is /; thus Μ = 1 whenever u is even. If d= 1(mod8), 12 —du? = +4 implies 
t?—u? = 4(mod8). But if t,u are odd, 1? —u? =1—1=0(mod8), hence u, and f, must be 


even. This proves (a) and the case uw even in (b),(c). Now assume u odd. If 
. , . 1 1, 
αἱ =5(mod8), d= p’+qq', p odd, so gq’ = 5—1=4(mod8). Thus 54: 5.4 ὅτε both 


odd and the off-diagonal elements of M are odd. Also, tf = du=1 so trace(M)=1, is odd. 


1 1 
So one of the diagonal entries of Μ is even and one is odd. Thus M = [ ἢ =U or 


0 | — ] Ι, 
Με ἴ =U’,ie. ΜΈΣ, . Ifd is even, then so is p, and ged Τα, =] 


rr 1 oi.. . . 
implies at least one of 599549 is odd. Now trace(M)= t = du Ξε so the diagonal entries 
are both even or both odd. But this characterizes M ε 2, and the proof is done. 

Note that we have M = / if and only if uis even, Now G being isomorphic to 
S, has a homomorphism G-> {+1} defined b:: assigning to each S ε G its signasa 
permutation. Thus S—-1 if S ΕΣ, and S — 1 otherwise. Composing with ΨῈΟ > G 


we obtain a homomorphism, or character, y,of Gto {+1}. χί Μὴ Ξ --ἰ for Me x, and 


y(M) =1 for ΜΈΣ, Ud. 
Keeping with the notation of the theorem, we have 


M " ie ἢ [Ps ἣ Β {’ eR U di bi dd 
= 0 1 0 see 1 0 . u 1 0 = Or accor Ing as 18 even OF O . 50 


b 1 
{ ᾿ ‘} =-—] or 1 according as ὦ, is even or odd. Since y is ahomomorphism 
x(M) = (-1)° where is e = e(x) is the number of terms ὁ, in the period of CF[x] that are 


even. Combining with the theorem we thus have: e odd iff γί ΜῈ) - -liff Me 2X, iff d 


even and μ odd. Thus d odd implies e even and d even implies e odd or even according 


250 


as u is odd oreven. But this 15 precisely the theorem stated in the Introduction, as 


promised. 


Note that for d odd the result in terms of e is weaker than the result of the theorem 


of this section, since 6 does not differentiate between 2, and £,. For d odd we always 


have eevenand ΜΞ] ifuwevenand M=U or U? if u odd. 


We conclude with some numerical examples. 


(1) At the end of the previous section we had for d -- [009 = 1(mod8), 


1080+ 341009 
εΞ-----------, Ne 


5 =-1, uw=34 even. We found 


(a) CF = [31,211,112], k=7; c=2, M=URU‘Re 1. 


oy 
2 


(b) cn Oe +28) = [4,1,2,1,2,2,3,1,1], A=9, e=4, M = RURUR?U? = 1, all as 
predicted. 
(2) For d = 904 = 30’ +4, even, we saw cr Oe = [30]. k=1, e=1, 
ες 0+ OF Ne=-1, uw=1,odd. Here M=R. Also 
cr] Poks28 =[9,2,1], 3, e=1, M=URUER. 


(3) Consider x = [0.24]; what can be said about d=disc(x)? Here e=2 does not tell us 


too much but M = R?U =U, so we know d =5(mod8) and wis odd. Also k=3, so 


31 2] 


3 ἢ , one can now find x by solving the quadratic 


Ne = —1. Calculating M = [ 


equation Μίχ) =x. But it’s easier to proceed using (22) which gives 


f+pu 1, 
54 U 31 21 | {+ t- ).3 
2 2 - = gcd] — [τ Pu FT PY ᾿ ula 
1 t— pu [7 "). Since 1 = eoa{ 4x, pu = 5 5 5g © 
2 2 


_gced(3,29,21)=1 we have q=6, q°=42, p=29, d= p’ τ φᾳ᾿ = 1093 = 5(mod 8) and 


ν1093 +29 33+ 71093 
t=. f=31+2=33 and E=——— 


(4) In the same way consider x = [2.2.2.1....,, -k=8, Ne =1. Since e=3 we have d even 


121 75 


50 so uw = gcd(50,90,75) = 5, 


and uodd; ΜΞ R°U°’ =UR. We find u-( 


g=20, q*=30, p=18, d = 18" +20x 30=924, x= Finally 


V924 +18 
20 
152+ 5ν 924 


=1214+31=152, εΞ 5 


251 


252 


References 

1. Adams,W.W. and Goldstein,L.J., Introduction to Number Theory, Prentice-Hall, 
1976. 

2. Borevich,Z.I. and Shafarevich,I.R., Number Theory, Academic Press, 1966. 

3. Lewittes, J., “Quadratic Irrationals and Continued Fractions” in Number Theory - 
New York Seminar 1991-1995,Chudnovsky, D.V. and Chudnovsky G.V. and 
Nathanson, M.B. (Eds.), Springer, New York,1996. 


4. Ono,T., An Introduction to Algebraic Number Theory, Plenum Press, 1990. 
5. Perron, O., Die Lehre von der Kettenbriichen (Band I), B.G. Teubner, 1954. 


6. Rockett,A. and Szusz, P., Continued Fractions, World Scientific, 1992. 


The inverse problem for representation 
functions of additive bases* 


Melvyn B. Nathanson! 
Department of Mathematics 
Lehman College (CUNY) 

Bronx, New York 10468 
Email: nathansn@alpha.lehman.cuny.edu 


Abstract 


Let A be a set of integers. For every integer n, let ra,2(n) denote the 
number of representations of n in the form n = αἱ + a2, where a1,a2 € A 
and a, < ag. The function ra. : Z— No U {oo} is the representation 
function of order 2 for A. The set A is called an asymptotic basis of 
order 2 if oe (0) is finite, that is, if every integer with at most a finite 
number of exceptions can be represented as the sum of two not necessarily 
distinct elements of A. It is proved that every function is a representation 
function, that is, if f : Z — No U {co} is any function such that f~*(0) 
is finite, then there exists a set A of integers such that f(m) = ra,2(n) for 
all n € Z. Moreover, the set A can be constructed so that card{a € A: 
Ια] <x} > wt? | 


1 Representation functions 


Let N, No, and Z denote the positive integers, nonnegative integers, and inte- 
gers, respectively. Let A and B be sets of integers. We define the sumset 


A+B={a+b:aeAand bE B}, 
and, in particular, 
2A-~A+A= {αι + αὦ : Q1,Q2 Ε A} 


and 
A+b=A+4 {b} = {a+b:ae A}. 


*2000 Mathematics Subject Classification: 11B13, 11B34, 11B05. Key words and phrases. 
Additive bases, sumsets, representation functions, density, Erdés-Turan conjecture, Sidon set. 

tThis work was supported in part by grants from the NSA Mathematical Sciences Program 
and the PSC-CUNY Research Award Program. 


D. Chudnovsky et al. (eds.), Number Theory 


© Springer-Verlag New York, Inc. 2004 
OU OU 


254 


The restricted sumsets are 
A+B ={a+b:a€A,bEB,andaFfb} 


and 
2AA=A+tA= {a, + αὐ : α1, 62 € A and a; # az}. 


Similarly, we define the difference set 
A-—-B={a-—b:aeéAand be B} 


and 


~A = {0} — A= {-a: -a€ A}. 


We introduce the counting function 


A(y,z) = δ᾽ 1. 
ac€A 
ysacer 
Thus, A(—2z,z) counts the number of elements a € A such that [α] < z. 

For functions f and g, we write f > g if there exist numbers cp and Zo 
such that |f(z)| > colg(x)| for all c > xo, and f « g if |f(x)| < colg(zx)| for all 
ΜΗ" > LQ- 

In this paper we study representation functions of sets of integers. For any 
set A C Z, the representation function r4.2(n) counts the number of ways to 
write n in the form n = a, + az, where aj,a2 € A and a, < ag. The set 
A is called an asymptotic basis of order 2 if all but finitely many integers can 
be represented as the sum of two not necessarily distinct elements of A, or, 
equivalently, if the function 


ra2:Z—+NoU{co} 


satisfies 
card(r4'5(0)) < oo. 


Similarly, the restricted representation function 7 4,2(n) counts the number 
of ways to write n in the form n = a; + ag, where a), @2 € A and αι < ag. The 
set A is called a restricted asymptotic basis of order 2 if all but finitely many 
integers can be represented as the sum of two distinct elements of A. 

Let 

f:Z—4NpU {coo} (1) 


be any function such that 
card(f~*(0)) < co. (2) 


The inverse problem for representation functions of order 2 is to find sets A such 
that r4.2(n) = f(n) for all n € Z. Nathanson [4] proved that every function f 
satisfying (1) and (2) is the representation function of an asymptotic basis of 
order 2, and that such bases A can be arbitrarily thin in the sense that the 


255 


counting functions A(—z,2z) tend arbitrarily slowly to infinity. It remained an 
open problem to construct thick asymptotic bases of order 2 for the integers 
with a prescribed representation function. 

In the special case of the function f(n) = 1 for all integers n, Nathanson [6] 
constructed a unique representation basis, that is, a set A of integers with 
ra2(n) =1 for all n € Z, with the additional property that A(—z, xz) > logz. 
He posed the problem of constructing a unique representation basis A such that 
A(—2z,2z) > τ for some a > 0. 

In this paper we prove that for every function f satisfying (1) and (2) there 
exist uncountably many asymptotic bases A of order 2 such that r4(n) = f(n) 
for alln € Z, and A(—z, 2x) > z'/3. It is not known if there exists a real number 
6 > 0 such that one can solve the inverse problem for arbitrary functions f 
satisfying (1) and (2) with A(—2,z) > «τὸ, 


2 ‘The Erdos-Turan conjecture 


The set A of nonnegative integers is an asymptotic basis of order 2 for No if 
the sumset 2A contains all sufficently large integers. If A is a set of nonnegative 
integers, then 

0 «τα, (πη) < co 


for every n € No. It is not true, however, that if 
f : No > No 


is a function with 
card (f~*(0)) < 00, 


then there must exist a set A of nonnegative integers such that r4o(n) = f(n) 
for all n € No. For example, Dirac [1] proved that the representation function 
of an asymptotic basis of order 2 cannot be eventually constant, and Erdés 
and Fuchs [3] proved that the mean value >). 74,2(n) of an asymptotic basis 
of order 2 cannot converge too rapidly to cx for any c > 0. A famous con- 
jecture of Erdéds and Turan [2] states that the representation function of an 
asymptotic basis of order 2 must be unbounded. This problem is only a special 
case of the general inverse problem for representation functions for bases for 
the nonnegative integers: Find necessary and sufficient conditions for a function 
f : No + No satisfying card (f~1(0)) < 00 to be the representation function of 
an asymptotic basis of order 2 for No. 

It is a remarkable recent discovery that the inverse problem for represen- 
tation functions for the integers, and, more generally, for arbitrary countably 
infinite abelian groups and countably infinite abelian semigroups with a group 
component, is significantly easier than the inverse problem for representation 
functions for the nonnegative integers and for other countably infinite abelian 
semigroups (Nathanson [5]). 


256 


3 Construction of thick bases for the integers 
Let [x] denote the integer part of the real number z. 


Lemma 1 Let f : Z — No U {oo} be a function such that f~1(0) is finite. 
Let A denote the cardinality of the set f—1(0). Then there exists a sequence 
U = {ux }P2, of integers such that, for everyn € Z ἀπά ΚΕΝ, 


f(n) = card{k > 1: uz, =n} 


and 


ω] < Ὁ 


Proof. Every positive integer ™m can be written uniquely in the form 
m=s*+s4+1+4r, 
where s is a nonnegative integer and |r| < s. We construct the sequence 


V = {0,—1,0, 1, —2, -1,0, 1,2, -3, -2, -1,0,1,2,3,...} 
= {Um }Ra=1 


where 
Us2tstitr =T for |r| < s. 


For every nonnegative integer k, the first occurrence of —k in this sequence is 
Uz241 = —k, and the first occurrence of k in this sequence is v(441)2 = k. 

The sequence U will be the unique subsequence of V constructed as follows. 
Let n € Z. If f(n) = oo, then U will contain the terms v,21,414, for every 
s > |n|. If f(n) = & < oo, then U will contain the @ terms v,245414n for 
s = |n|,|n|+1,...,|/n|+2@-—1 in the subsequence U, but not the terms v,24.5414n 
for s > |n| + 4. Let πὶ < mz < m3 < --- be the strictly increasing sequence 
of positive integers such that {um, }72, is the resulting subsequence of V. Let 
U = {ux}, where uz = Um,. Then 


f(n) =card{k >1:u, =n}. 


Let card (f~*(0)) = A. The sequence U also has the following property: If 
|u,| = n, then for every integer m ¢ f—'(0) with |m| < n there is a positive 
integer 7 « k with u; = m. It follows that 


{0,1, -1,2,-2,...,n—1,-—(n—1)}\ 51 (0) C {uy, ua,..-, up—1}, 


and so 
k-1>2(n—1)+1-A. 


This implies that 


k+A 
luz; =n< . 


257 


Since uz is an integer, we have 


luz| < A . 


This completes the proof. 0 


Lemma 1 is best possible in the sense that for every nonnegative integer A 
there is a function f : Z — No U {oo} with card (7 (0)) = A and a sequence 
U = {ux}, of integers such that 


[μι] = As for all k > 1. (3) 


For example, if A = 26 + 1 is odd, define the function f by 


_f 0 if |n[ <6 
70) = 4 1 if|n|[>6+1 


and the sequence U by 


U2i-1 = 0 +1, 
U2; = —(6 + 1) 


for all 1 > 1. 
If A = 26 is even, define f by 


(n) = 0 if-d<n<d-1 
PM=) 1 ifn>dorn<-6-1 


and the sequence U by wu; = ὃ and 


Uzi ΞΞ ὃ +4, 
U2i4+1 = —(6 + 1) 


for all i > 1. In both cases the sequence U satisfies (3). 
Theorem 1 Let f :Z— No U {oo} be any function such that 
A = card(f~+(0)) < co. 
Let 
c=8+ AS 
There exist uncountably many sets A of integers such that 


rao(n) = f(n) for allné Z 


and 


A(-—2,2) > (2) . 


258 


Proof. Let 
A = card(f~'(0)). 


By Lemma 1, there exists a sequence U = {ux }¢, of integers such that 


f(n) =card({i Ee N: u; = n}) for all integers n (4) 


and 
k+A 


luz] < for all k > 1. (5) 


We shall construct a strictly increasing sequence {i,}?°., of positive integers 
and an increasing sequence {A;}?°, of finite sets of integers such that, for all 
positive integers k, 


(i) 
| Ax| = 2k, 


(ii) There exists a positive number c such that 
Ax C [-- οἰ", ck?) 
(iii) 
TA,,2(n) < f(n) for all n € Z, 
(iv) For j =1,...,k, 


TA, ,2(U;) > card({i < ἐκ : Uy ΞΞ u;}). 


Let {A,}72, be a sequence of finite sets satisfying (i)-(iv). We form the 
infinite set - 
A = LU Αι. 
k=1 


Let x > 8c, and let k be the unique positive integer such that 
ck? <2 <c(k+1)?. 


Conditions (i) and (ii) imply that 


A(—2z,x) > |Az| = 2k > 2 (“)"" —2> (ξ΄. 


6 
Since 
f(n) = jim card({z « ἐκ : ui = n}), 
00 


conditions (111) and (iv) imply that 
ra2(n) = lim ra, o(n) = f(n) 
k—-0o 


for alln Ε Z. 


259 


We construct the sequence {A,}7°., as follows. Let ἡ} = 1. The set A; will 


be of the form A; = {a; + u;,,—a,}, where the integer a, is chosen so that 
2A, f—1(0) = @ and a; + u;, # --αι. This is equivalent to requiring that 
2a, ¢ (f~*(0) — 2ui,) U (—f~* (0) {-ὠ,,}. (6) 


This condition excludes at most 1 + 2A integers, and so we have at least two 
choices for the number a; such that |a;| < 1+ A and a, satisfies (6). Since 
lui, | = [ὼ}} < (1+ A)/2 and 


3(1 + A) 

2 ) 
it follows that A; C [—c,c] for any c > 3(1 + A)/2, and the set A; satisfies 
conditions (i)—(iv). 

Let k > 2 and suppose that we have constructed sets A,,...,A,—1 and 
integers 113 < --- < 2,1 that satisfy conditions (i)—(iv). Let i, > i,_ 1 be the 
least integer such that 


jay + ui, | < ja, | + es, | < 


TA,—1,2(Ui,) < f (Ui, ). 
Since 


ἐς -1< δ᾽ T Ay_1,2(N) 


né€{uy ,UQ,..-,Ui, —1 } 


< δ᾽ ΤΑΚ. ι.2 (n) 


nee 
_ (2k-1 
7 2 
< 2k?, 
it follows that 
tk < 2k? 
Also, (5) implies that 
ita A 
lun << ASS 4S. (7) 


We want to choose an integer a, such that the set 
Ay = Ap_1 U {ax + Ui, , —Ox } 
satisfies (i)—(iv). We have |A,| = 2k if 
ακ + Ui, F —Ar 


and . 
ArR—1 NM {ar + Ui,» —a,} = 0, 


or, equivalently, if 


ap 4 (—Ag—1) U (Ag-1 — Ui,) U {--ἰ;, /2}. (8) 


260 


Thus, in order for Ay_1 U {ax + ui, , —ax} to satisfy condition (i), we exclude at 
most 2|A;,—1| + 1 = 4k — 3 integers as possible choices for a,. 
The set A; will satisfy conditions (iii) and (iv) if 


2ZARN f7'(0) = 9 
and 


TA, —1,2(") for all n € 2Ax_, \ {ui, } 
TA,2(m) = 4 TA,_,,2(n) +1 forn = ui, 
for all n € 2A, \ (2Ag_1 U {uj, }). 
Since the sumset 2A, decomposes into 


24. ΞΞ 2 (Ap_1 U {ar + ΟΣ -αμ})} 
= 2Ακ.- U (Ag_-1 + {ax + Ui, , —axn}) U {ui,, 2a, + 2, —2ax}, 


it suffices that 


(Ag_1 + {ax + Ui,, —a4}) Ο 24... = 9, (9) 

(Ag_1 + {ax + ui,, —an}) MN f7'(0) =O, (10) 

(Ag_1 + ap + Us,) M(Ag_i — ag) = 9, (11) 

{2a, + 2u;,,—2a,}N2A,_1 =O (12) 

{2a, + 2u;,, —2az}M f~'(0) τὸ ἡ (13) 

{2a, + 2ui,, —2an}M (Ag_—1 + {ax + Ui,, —ax}) = 0. (14) 


Equation (9) implies that the integer a, must be chosen so that it cannot be 
represented either in the form 


ακ = τῇ + LQ — T3 — Ui, 
or 
Qk = ©] — 72 — XB, 


where £1, 12,23 € Ακ. 1. Since card(A,z_1) = 2(k—1), it follows that the number 
of integers that cannot be chosen as the integer a, because of equation (9) is at 
most 2(2(k — 1))? = 16(k — 1)3. 

Similarly, the numbers of integers excluded as possible choices for a, because 
of equations (10), (11), (12), (13), and (14) are at most 4A(k—1), 4(k—1)?, 8(k— 
1)?,2A, and 8(k — 1), respectively, and so the number of integers that cannot 
be chosen as a, is 


16(k—1)° + 12(k — 1)? + (4A + 8)(kK—1)+2A 


= 16k* — 36k? + (32 + 4A)k — 2A -- 12 
< (16 + A)k? — 4k? — 32k(k — 1) — 2A -- 12. 


261 


Let A 
c=8+ ak 


The number of integers a with 


lal < εκ — k? — a = (s + A= )) ke —k? -- AZ] (15) 


is 


> (16 + A) k® — 2k? — A. 
If the integer a satisfies (15), then (7) implies that 
la + u;,| < Jal + |uz,| < ck. 


It follows that there are at least two acceptable choices of the integer a, such 
that the set A, = Ag— 1 U {ax + U;,,—ax} satisfies conditions (i)—(iv). Since 
this is true at each step of the induction, there are uncountably many sequences 
{A;,}9°, that satisfy conditions (i)—(iv). This completes the proof. O 


We can modify the proof of Theorem 1 to obtain the analogous result for 
the restricted representation function 74,2(n). 


Theorem 2 Let f :Z— No U {co} be any function such that 
card(f—*(0)) < oo. 
Then there exist uncountably many sets A of integers such that 
ῥα, (πη) = f(n) for alln € Z 


and 
A(—a, 2) > «15. 


4 Representation functions for bases of order h 


We can also prove similar results for the representation functions of asymptotic 
bases and restricted asymptotic bases of order h for all h > 2. 

For any set A C Z, the representation function r4,(n) counts the number 
of ways to write n in the form n = a; + a2 +---+a@p,, where a1,a2,...,an € A 
and a, < ag <--:<ap,. The set A is called an asymptotic basis of order h if all 
but finitely many integers can be represented as the sum of h not necessarily 
distinct elements of A, or, equivalently, if the function 


ran: Z— No U {oo} 


262 


satisfies 
card(ry },(0)) < οο. 


Similarly, the restricted representation function 7,4 ,(n) counts the number of 
ways to write n as a sum of ἢ pairwise distinct elements of A. The set A is 
called a restricted asymptotic basis of order h if all but finitely many integers 
can be represented as the sum of h pairwise distinct elements of A. 


Theorem 3 Let f :Z— No U {co} be any function such that 
card(f~1(0)) < οο. 
There exist uncountably many sets A of integers such that 
ran(n) = f(n) forallne Z 


and 
A(—2, x) > οὐ ΠΝ) 


and there exist uncountably many sets A of integers such that 
Fan(n)=f(n)  forallneZ 


and 
A(—a, 2) >> ol/@h-1), 


References 


[1] G. A. Dirac, Note on a problem in additive number theory, J. London Math. 
Soc. 26 (1951), 312-313. 


[2] P. Erdés and P. Turdn, On a problem of Sidon in additive number theory 
and some related questions, J. London Math. Soc. 16 (1941), 212-215. 


[3] P. Erdés and W. H. J. Fuchs, On a problem of additive number theory, J. 
London Math. Soc. 31 (1956), 67-73. 


[4] M. B. Nathanson, Every function is the representation function of an addi- 
tive basis for the integers, www.arXiv.org, math.NT /0302091. 

[5] , Representation functions of additive bases for abelian semigroups, 

Ramanujan J., to appear. 


[6] _ , Unique representation bases for the integers, Acta Arith., to appear. 


On the ubiquity of Sidon sets* 


Melvyn B. Nathanson! 
Department of Mathematics 
Lehman College (CUNY) 

Bronx, New York 10468 
Email: nathansn@alpha.lehman.cuny.edu 


Abstract 


A Sidon set is a set A of integers such that no integer has two es- 
sentially distinct representations as the sum of two elements of A. More 
generally, for every positive integer g, a Be[g]-set is a set A of integers 
such that no integer has more than g essentially distinct representation- 
s as the sum of two elements of A. It is proved that almost all small 
subsets of {1,2,...,n} are Bo[g]-sets, in the sense that if B2[g|(k,n) de- 
notes the number of Be[g]-sets of cardinality k contained in the interval 


{1,2,...,n}, then limn-+o0. Ba[g](k,n)/(Z) =1ifk =o (ner) 


1 Sidon sets 


Let A be a nonempty set of positive integers. The sumset 2A is the set of all 
integers of the form a; + a2, where α1,6α2 € A. The set A is called a Sidon set 
if every element of 2A has a unique representation as the sum of two elements 
of A, that is, if 

Q1,02,0;,a,E A 


and 
/ / 
a, ta2 =a, +45, 
and if 
αι Φα5ῷ and αἵ « α, 
then 


a, =a, and ag=4y. 

*2000 Mathematics Subject Classification: 11B13, 11B34, 11B05. Key words and phrases. 
Sidon sets, sumsets, representation functions. 

tSupported in part by grants from the NSA Mathematical Sciences Program and the PSC- 
CUNY Research Award Program. This paper was written while the author was a visitor at 
the (alas, now defunct) AT&T Bell Laboratories in Murray Hill, New Jersey, an excellent 
research institution that split into AT&T Research Labs and Lucent Bell Labs, and provided 
another instance of a whole being greater than the sum of its parts. 


263 


\ 


D. Chudnovsky et al. (eds.), Number Theory 


© Springer-Verlag New York, Inc. 2004 
OU OU 


264 


More generally, for positive integers h and g, the h-fold sumset hA is the set of 
all sums of h not necessarily distinct elements of A. The representation function 
r4,n(m) counts the number of representations of m in the form 


m=a, +a2+-+:-+a4np, 


where 
α:ΕΑ for alli = 1,2,...,h, 


and 
αι Sag S++: < aap. 


The set A is called a B;,[g]-set if every element of hA has at most g representa- 
tions as the sum of h elements of A, that is, if 


rA,n(m) <g 


for every integer m. In particular, a B2[1]-set is a Sidon set, and Bp[1]-sets are 
usually denoted B,-sets. 

Let h > 2. Let A be a nonempty set of integers, and a € A. Then rp(m+ 
a) > rn-i(m). Therefore, if r4,-1(m) > g for some m Ε (ἢ — 1)A, then 
ra,n(m +a) > g for every a € A. It follows that if A is a B,[g]-set, then A is 
also a Bn—1[g]-set. In particular, every B,- set is also a By_1-set. 

Let A be a subset of {1,2,...,n}, and let [4] denote the cardinality of A. 
Then hA C {h,h+1,...,hn}. If |A| = k, then there are exactly (hte) 
ordered h-tuples of the form (a;,...,an,), where a; € A for alli =1,...,h and 
αι <-+: <ap. If A is a B,[g|-set and |A| = k, then 


κ᾿" k+h—-1 
_ 5 
hl ~ ( h ) ΒΩ ra,n(m) < g|hA| « ghn, 


and so 
ΙΑ] =k « εὐ} 


forc = (high)!/" | It follows that if A is a “large” subset of {1,2,...,n}, then 
A cannot be a B,[g]-set. In this paper we prove that almost all small” subsets 
of {1,2,...,n} are Bo[g]-sets and almost all ”small” subsets of {1,2,...,n} are 
B,-sets. 

Notation. If {un}? and {un}°2, are sequences and v, > Ὁ for all n, 
we write Un = ο(υμ) if limp Un/Un = 0, and un = O(vn) or un < Un, if 
|un| < cUn for some c > 0 and all n > 1. The number ¢ in this inequality is 
called the implied constant. 


2 Random small B,/g]-sets 


We require the following elementary lemma. 


265 


Lemma 1 Ifn>1 and0<j<k<n7n, then 
(3) < (£)’. 
(7) 7 


) -- 1 


(3) (ἀ- ἢ! ττκ-ὶς (Εν 
gpm ~ Wa < (5) 


Proof. We have 


since 


fori =0,1,...,.n—-—1.0 


Theorem 1 For any positive integers g,k, and n, let Bo[g|(k,n) denote the 
number of Bz|[g|-sets A contained in {1,...,n} with [4 =k. Then 


Ba[g](k,n) > ( (1 _ | | 


Proof. Let A be a subset of {1,2,...,n} of cardinality k. If A is not a 
Bz[g]-set, then there is an integer m < 2n such that r42(m) > g, that is, m 
has at least g + 1 representations as the sum of two elements of A. This means 
that the set A contains g + 1 integers a1,...,a@,41 such that 


mm 
1Ξ αι «τ. «ἀρεῖ 55, 


and A also contains the 9. +1 integers m -- α; fori =1,...,g +1. If ag41 « m/2, 


then 
|{ai,m —~a;:t1=1,...,g+1}| = 29 -- 2. 


If a941 = m/2, then 
l{ai,m—a;,:i=1,...,g+1}| = 29-1. 


Therefore, for each integer m, the number of sets A C {1,...,n} such that 
|A| = k and r42(m) > 9+ 1 is at most 


(τ) τ Τὴ  χ:ἢ, 
(τ "9 - 


and so 


(Ὁ) - pxaiian) < ΣΣ 


m<2n 


266 


Observing that 


and 


and applying Lemma 1, we obtain 


ες Bali) <5 pose Ἐπ πὴ 
k m<2n Ὁ 


+ 
29+2 m— 29+1 
Σ ONG +E CP)’ 
m<2n +1 m<2n 9 1 
2g+2 k 2g+1 
« 2nst? (= ) + 2n9t! (5 
nm 


AK29+? 
< . 


nI 


This completes the proof. 0 


Theorem 2 Let {k,}?°., be a sequence of positive integers such that Κι, <n 
for alln and 


kn = 0 (n9/(20+2) ) . 


Then B k 
fn Bala] nom) 
n—- oo “) 
Proof. This follows immediately from Theorem 1. 0 


Theorem 3 Let Bo(k,n) denote the number of Sidon sets of cardinality k con- 
tained in {1,...,n}. If kn =o0(n1/*) , then 


lim Balkn,m) = 1. 
n—>00 (() 


Proof. This follows immediately from Theorem 2 with g = 1.0 


Theorem 4 Let {kn }°2, be a sequence of positive integers such that ky <n 
for all n and | 


kn = 0 (n9/(20+8)) . 


267 


Then 
m Συκεκ, B2[g](k, n) _ 


ΩΝ Σίκαι, (0) ᾿ 
Proof. It suffices to show that 
r fi k<kn ( " B2[g](k,n)) 
im —s 7 — =), 
nF OO δίκαι, () 


where f(k) is defined in the proof of Theorem 1. If αι;..., ἀρ, δι... ὃ. are 


positive real numbers and B = max(bj,...,b¢), then 
αι τ πὰρ αι τ. πὰρ αἱ ae 
a EE “πὶ ρνν 6“ 
by +--+ τὸ. τ Β τ by be 


This implies that 


Lo k<kn (ὦ - B2[g](k,n)) c 3 Ak29+2 


Σίεςι, () κέκι 19 
4κ2918 
= 9 
294+3 
—4(__*n__ 
n9/(29+3) 
= o(n), 


and the proof is complete. O 


We can restate our results in the language of probability. Let 9 be the 
probability space consisting of the (2) subsets of {1,...,π} of cardinality k, 
where the probability of choosing A € 2 is 1/(7). If Py,g(k,n) denotes the 
probability that a random set A € 2 is a By[g]-set, then Theorem 2 states that 


Jim Pz 9(kn,n) =1 


if ky = 0 (n9/(29+2)) 

Similarly, Theorem 4 states that if k, = o(n9/(?9t+3)) and if Prig(kn,7) 
denotes the probability that a random set A C {1,...,n} of cardinality |A| < ky, 
is a Bylg]-set, then 

lim P29(Kkn, τι) ΞΞ 1. 
7ι-ὸ CO 


3 Random small Β,,.--56 8 
A set A is called a By-set if r4j,(m) = 1 for all m € hA. Let Bp(k,n) denote 


the number of B;-sets of cardinality k contained in {1,...,n}. Since every set 
of integers is a B,-set, and every Bp set is a Bp_,-set, it follows that 


(1) = Bulbsn) 2 ++ > Bualkyn) > Balksn) > 


268 


We shall prove that almost all “small” subsets of {1,...,n} are B,-sets. The 
method is similar to that used to prove Theorem 2, but, for h > 3, we have to 
consider the possible dependence between different representations of an integer 
nm as the sum of h elements of A. This means the following: Let (a1,...,@;,) and 
(a,,---,@;,) be h-tuples of elements of A such that 


αι +---+ap, =a, +---+a), 


αι 35 --- < ap, 


/ / 
αι -::: <a), 


and 
{a,,...,a@n}N{aj,...,a,} FO. 


If h = 2, then a; = αἱ for i = 1,2, but if h > 3, then it is not necessarily 
true that a; = a; for all: = 1,...,h. For example, in the case h = 3 we have 
14+34+4=1+42+45 but (1,3,4) ~ (1,2,5). In the case h = 5 we have 
14+14+24+343=14+24+2+42+43 but (1,1, 2,3,3) ¥ (1, 2,2, 2,3), even though 
{1,1,2,3,3} = {1, 2,2, 2, 3}. 

Because of the lack of independence, we need a careful description of a rep- 
resentation of m as the sum of fh not necessarily distinct integers. We introduce 
the following notation. Let A be a set of positive integers. Corresponding to 
each representation of m in the form 


77. -Ξ α1Δ  --- Ἔ ap, (1) 
where a ,,...,@;, € A and a; <--- < apn, there is a unique triple 
(r, (hj), (a5)), (2) 


where 
(i) r is the number of distinct summands in this representation, 


(ii) (hj) = (ty,...,h,) is an ordered partition of h into r positive parts, that 
is, an r-tuple of positive integers such that 


h=h,+-:-+A,, 


(iii) (@;) = (a),...,a@,) is an r-tuple of pairwise distinct elements of A such 


that 


and 


(iv) 
m=hja,+---+h-a,, 


where each integer α΄, occurs exactly h; times in the representation (1). 


269 


There is a one-to-one correspondence between distinct representations of an 
integer m in the form (1) and triples of the form (2). Moreover, for each r and 
m, the integer a, is completely determined by the ordered partition (h;) of ἢ 
and the (r — 1)-tuple (a},...,a/_,). Therefore, for positive integers m and r, 
the number of triples of the form (2) does not exceed 1,(h)m™—!, where 7,(h) 
is number of ordered partitions of h into exactly r positive parts. 


Theorem 5 Let ἢ > 2. For all positive integers k < n, 


n 2h 
Bn_1(k, n) - Br(k, n) < ( a 


B,(k,n) > () (1 - —) | 


where the implied constants depend only on h. 


Proof. Let A be a B,_1-set contained in {1,...,n}. Then hA C {h,h+ 


1,..., hn}. If m Ε ΝΑ and m has two distinct representations as the sum of ἢ 
elements of A, then there exist positive integers r; and rj and triples 
(r1,(ha,j),(@1,5)) and (7), (ha,3), (22,5) (3) 


such that, for 1 = 1 and 2, we have 


τι 
ΝΕ ! 
m= νὴ hij Qj, 5 
j=l 


and 
l<aj,<---<aj,, <m. 


The number of pairs of triples of the form (3) for fixed positive integers m™, 
γ71 ἢ, and rg < his at most 


Try (h)m™ 1 tr (h)m™2—} < γι το 


where the implied constant depends only on h. Moreover, since A is ἃ Bp,_1-set, 
no number can have two representations as the sum of h—1 elements of A. This 
implies that 


Ι ! Ι / Ι ! ΝΕ 
αι 1. αἹ 2» vee ἡ κι} 8 (α.1ν (5.25 vee 1A τὰ] = 0, 


and so the set 


{a,:i=1,2and j =1,...,73} 


270 


contains exactly 71 +r2 elements of A. Therefore, given positive integers r; < h, 
rg <h and τη < hn, there are 


< mritr2—2 m—Ty—12 
k- 71 — 7. 


sets A for which m has two representations as the sum of h elements of A, and in 
which one representation uses r; distinct integers and the other representation 
uses Tp distinct integers. Summing over m < hn, we obtain 


«πηι |72-1 7. τ ῦ -- 72 | 
ἰ -- τἱ —12 
Applying Lemma 1, we obtain 


Bn_i(k,n) — B,(k,n) < () Σ, 
( 


ner hon ra) 


T1,7T2 <h (3) 


ry+r2 
n ryt+r2—1 k 
‘) 2, ἢ {) 


It follows that 


() - Βη(κ,π) = Βι(κ,η) — Balk,n) 


= δ᾽ (Β;.--κ(Κ, n) " B;(k,n)) 


h . 
n\ Κ21 
<)>, ( τ 


and so 


Bi (k,n) > ( (1 - =) | 


This completes the proof. 0 


Theorem 6 Let B,(k,n) denote the number of By-sets A contained in {1,...,n} 
with |A| = k. Let {k,}°°, be a sequence of positive integers such that 


kn = 0 (nt/?*) . 


271 


Then 


lim Brlkn,n) = 1. 
nm— Ooo (() 


Proof. By Theorem 5, 


Bp(kn, 71) ( Kn ᾿ 
1> ποι π'5..1--[{-5 ) 
() mi/2h 
and so Bik 
lim BAlKns 7) _ 4, 
n—00 (() 


This completes the proof. [Π 


4 Remarks added in proof 


A variant of Theorem 3 appears in Nathanson [2, p. 37, Exercise 14]. Godbole, 
Janson, Locantore, and Rapoport [1] used probabilistic methods to obtain a 
converse of Theorem 6. They proved that if 


li Kn = 
n> 00 ni/2h O93 
then Be (k 
lim Bi(Rns1) = 
n—>0o (() 


0. 
They also analyzed the threshold behavior of B,(k,n), and proved that if 
. Kn 
nim nian ~ Ὁ; 


then 
Br(kn, n) ΝΕ eo 


tf) 


where \ = K,,A2”. 
It is natural to conjecture that analogous results hold for the function Balg|(k, n), 
namely, if 


kn 
im τα ΤΗΣ — Ὁ» 
then 
lim 2atkns) _ 4 
n—0o v) 
and if 
li CO, 


n 
mn ——— = 
noo ng/(ghth) 


272 


then 


It should also be possible to describe the threshold behavior of By(kn,n)/ ( t) 
in the case 


=A>Q0O. 


lim 
n—-Cco 


Bn(kn,n) 


(i) 


References 


[1] A. P. Godbole, 5. Janson, N. W. Locantore, Jr., and R. Rapoport, Random 
Sidon sequences, J. Number Theory 75 (1999), 7-22. 


[2] M. B. Nathanson, Additive number theory: Inverse problems and the geom- 


etry of sumsets, Graduate Texts in Mathematics, vol. 165, Springer-Verlag, 
New York, 1996. 


